Median of numbers

How to improve this code?
TASK: Find median of given numbers. Print the integer part of it.
Sample Input:
1
3
4
60
70
50
2

Sample Output:
1
2
3
3
4
27
4

Numbers in [0, 2^31].
There will be less than 50001 numbers.

Currently it takes around 29 seconds to run with a 50k line input I generated myself. How should I shrink it down to 0.5 second? Or should I switch language? I just learnt C++ in 2 days.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
  #include <iostream>
#include <vector>
#include <iomanip>
#include <algorithm>
#include <string>
#include <time.h>

using namespace std;

template<typename T,int N>
//template argument deduction
int size(T (&arr1)[N]) //Passing the array by reference
{
    return sizeof(arr1)/sizeof(arr1[0]); //Correctly returns the size of 'list'
    // or
    return N; //Correctly returns the size too [cool trick ;-)]
}

long int median(vector<long int> scores)
{
    size_t size = scores.size();

    if (size == 0)
    {
        return 0;  // Undefined, really.
    }
    else if (size == 1)
    {
        return scores[0];
    }
    else
    {
        std::sort(scores.begin(), scores.end());
        if (size % 2 == 0)
        {
            return (scores[size / 2 - 1] + scores[size / 2]) / 2;
        }
        else
        {
            return scores[size / 2];
        }
    }
}

int main() {
    clock_t t1, t2;
    t1 = clock();
    vector<long int> array;
    vector<long int>::iterator it;
    string line;
    try {
        while (getline(cin, line)) {
            it = array.end();
            array.insert(it, stoi(line));
            cout << median(array) << endl;
        }
    } catch (...) {

    }
    t2 = clock();
    float diff = ((float)t2-(float)t1) / CLOCKS_PER_SEC;
    cout<<diff<<endl;
    return 0;
}
UPDATE: The problem is originally from UVa Judge and I am running it on an Hong Kong Online Judge ('Private'?) but the number of input and time allowed is changed.
One way to speed it up is to use array.reserve(50000) or whatever the new max input is
so there will no more memory allocations.
Also consider using push_back instead of insert.
Also if you want speed stop using the std::string/stoi() for the input and use the proper type of variable instead. And note that you're using type long not type int so you should be using stol() not stoi().

Also do you really need to use the type long? Using an int seems a better choice for the input you've shown.

Doesn't median require at least three elements in your array?

Lastly you shouldn't be calling median() inside the loop it should be called after the loop, especially since you're sorting the array inside that function.

std::nth_element() will typically execute in linear time.
http://en.cppreference.com/w/cpp/algorithm/nth_element

Usually implemented using introselect (median of medians + quickselect)
https://en.wikipedia.org/wiki/Introselect

Median of medians finds an approximate median in linear time only, which is limited but an additional overhead for quickselect. When this approximate median is used as an improved pivot, the worst-case complexity of quickselect reduces significantly from quadratic to linear ... https://en.wikipedia.org/wiki/Median_of_medians


For example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#include <iostream>
#include <vector>
#include <random>
#include <algorithm>
#include <ctime>

unsigned int median( std::vector<unsigned int>& seq )
{
    const auto n = seq.size() ;

    std::nth_element( seq.begin(), seq.begin() + n/2, seq.end() );
    const unsigned int m1 = seq[n/2] ;

    if( n%2 == 1 ) return m1 ; // if n is odd

    // even n
    std::nth_element( seq.begin(), seq.begin() + (n-1)/2, seq.end() );
    return ( m1 + seq[ (n-1)/2 ] ) / 2 ;
}

int main()
{
    const std::size_t n = 8'000'000 ; // median of 8 million elements
    
    std::mt19937 rng ; // deliberately not seeded
    std::uniform_int_distribution<int> distrib ;

    // fill a vector with n random numbers
    std::vector<unsigned int> seq(n) ;
    for( unsigned int& v : seq ) v = rng() ;

    volatile unsigned int m = 0 ;

    {
        // sequence size is even
        const auto start = std::clock() ;
        m = median(seq) ;
        const auto end = std::clock() ;
        std::cout << (end-start) * 1000.0 / CLOCKS_PER_SEC << " milliseconds\n" ;

    }

    seq.push_back( rng() ) ;
    std::shuffle( seq.begin(), seq.end(), rng ) ;

    {
        // sequence size is odd
        const auto start = std::clock() ;
        m = median(seq) ;
        const auto end = std::clock() ;
        std::cout << (end-start) * 1000.0 / CLOCKS_PER_SEC << " milliseconds\n" ;
    }
    
    return m - m ;
}

http://coliru.stacked-crooked.com/a/b291a0cba80cdd10
The problem is that you're sorting the vector after reading each number. Since the average size of the vector is N/2 and you're sorting it N times, the complexity is something like N/2 * N *log2(N) = 25000 * 50000 * 15.61 ~ 19.5*109. If we optimistically say that a 4GHz processor can execute 2 instructions per clock cycle then even if the inner part of the sort function was just 1 instruction, it would still take 2.4 seconds to run the program

So you need a different algorithm, not just better code for the algorithm you have.

Since you print the median of each vector as you add it, the trick to this problem is to figure out how to compute the median of the numbers A1...An given that you've already computed the median of the numbers A1...An-1.

My advice is to store the numbers in a set rather than a vector. Since a set's iterator remains valid when you insert a new item you can keep an iterator that points to the median value (or the smaller of the two). After inserting a new value, adjust the iterator depending on whether the new value is bigger or smaller than the old median and maybe on whether the new count is odd or even. You will only have to increment or decrement the iterator once (make it point to the next larger or next smaller number). Then compute the new median from the iterator. Repeat the process.

Topic archived. No new replies allowed.