Seemingly Efficient Algorithm Gets Time Limit Exceeded

There's a competitive programming problem that reads as follows:


"In the beginning there are 'n' empty rows. In each step either a number is added to the end of each row, or several numbers are deleted from the front of the row and you should print the sum of the deleted numbers (note that the row might become empty).

Input is in the following form:
First line contains two numbers 'n' and 'q' which represent the number of rows and events respectively.
In the next q lines one of the following two options will appear:
A) '1 x' which means that x should be added to the end of all rows.
B) '2 i j' which means that j elements should be deleted from the front of the row i (0<= j <= row's size).

Expected Output:
For each input of type B (i.e. '2 i j') the sum of deleted numbers should be printed."


Sample Input:
2 5
1 5
1 17
2 1 1
1 1
2 2 3

Sample Output:
5
23


Constraints:
Time: 1 Second
Memory: 256 MB
1 <= n, q <= 300 000
1 <= i <= n
1 <= x <= 10^9
Last edited on
Here's the solution I submitted but I get "Time Limit Exceeded":

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#include <bits/stdc++.h>

using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);
    int q; cin >> q >> q;     //Ignored the 'n' input since I didn't need it
    queue<int> reference_line;     //Since all the rows are the same I created as the reference row which represents all the rows (in case they are unchanged though)
    map<int, queue<int> > line;     //I use this map to save the rows that have some of their elements deleted. The key represents row index
    while (q--) {
        int v; cin >> v;
        if (v == 1) {
            cin >> v;
            reference_line.push(v);
        }
        else {
            cin >> v;
            if (line.find(v) == line.end()) line[v] = reference_line;
            int j; cin >> j;
            int sum = 0;
            while (j-- && !line[v].empty()) {
                sum += line[v].front();
                line[v].pop();
            }
            cout << sum << '\n';
        }
    }
    return 0;
}
In case you're wondering why I didn't create a simple array containing all the queues/rows, the code will get a "Memory Limit Exceeded" in that case.

Thanks in advance for all the replies.
I think there's a more efficient algorithm.

You need one row of numbers, keeping track of every value added. Your reference line.

You need an array of size n, each element of which tells you how many numbers have already been removed from the front of row n.

No need for a map. No need to keep a complete record of each line that differs from the reference line.

In addition to what Repeater said, you could also store the running sums instead of the numbers themselves. E.g., suppose the input numbers were:

4, 2, 3, 1, 5, 7, 3

You store the running sum like this:

4, 6, 9, 10, 15, 22, 25

Then if you need the sum from index 0 to 3 (incl.) you just print the value at index 3. If you need index 2 to index to index 6 you just print the value at index 6 minus the value at index 1.

Also note that the sums need to be long long since they could be as high as 1e9 * 3e5 = 3e14.
@Repeater Fantastic idea. Thanks for the reply.

I altered my code according to your suggestion, but unfortunately still it gets "Time Limit Exceeded".

Here's the code after the tweaks:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#include <bits/stdc++.h>

using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);
    int n, q; cin >> n >> q;
    vector<int> reference_line;
    int line[n] = {};
    while (q--) {
        int v; cin >> v;
        if (v == 1) {
            cin >> v;
            reference_line.push_back(v);
        }
        else {
            cin >> v; v--;
            int j; cin >> j;
            int sum = 0;
            for (int i = line[v]; i < line[v] + j; i++) {
                if (i >= reference_line.size()) break;
                sum += reference_line[i];
            }
            line[v] = j;
            cout << sum << '\n';
        }
    }
    return 0;
}
Last edited on
algorithm as-is.. maybe try

reserve reference_line's size (this is probably the biggest tweak)

move the if/break to just for (... extra stop condition aware of short circuit) and rearrange the logic as necessary (this may or may not help, unclear how compiler manages this)

verify cout sum is not a bottleneck and if it is, handle it differently (probably not an issue)





@tpb Thanks a lot for the great advice. Changed the code and it got "Accepted".
@jonnin Thanks for pointing out those.

How much of a difference would reserving the reference_line's size really make?

Also the cout at the end wouldn't bottleneck the whole code. I tried to make i/o as efficient as possible.
its hard to say. vector has a built in algorithm to attempt to manage memory sensibly, but every time you exceed the current size it does all the usual pointer junk:
creates a new pointer and allocates memory (more than current, + some amount to allow growth).
copies all the data to the new location
updates vector's pointer to new one
deletes old pointer.

the copy is bad enough, but getting and freeing memory is usually "slow" (relatively in terms of cpu clock cycles) as well as it goes through the OS to do it, which probably goes through interrupts and context switchs and all that sort of badness. Just the memory management is also fed thru an algorithm to decide which block to give you.

not every time in the loop, but doing this even once stinks when you are trying to beat a time constraint.
also push back costs more than vector[x] = value. not much more, but it all adds up. here, its probably a zero sum as you will have to track its used/actual size yourself. If you didn't need its size, it would save having the object track that... if you could rewrite it so you didnt need size (I don't see a way here, this is more useful when you have a fixed amount of stuff).

there are a couple of other nanosecond things you could try. How long does it take currently? can you profile it and spot the issue? I personally struggle to trust compilers because they used to be bad at their job. It shouldn't matter, but another thing I would have done here is declare ALL the variables at the program start, even the loop variable. Its probably optimized away, but just in case the compiler is feeling dumb today I would do that to avoid it creating and destroying (this is just a push/pop of the program stack I believe here) them each iteration. We are getting into the weeds here.... you might also eyeball the assembly listing for anything dumb it might be doing. This code is small enough that you can rewrite / modify it a couple of ways to see if you can stumble on a form that the compiler is better able to optimize.
Last edited on
@jonnin Thanks again for the awesome feedback.

Well, I read a bit more about vector memory allocation after you told me, and you're totally right, reserving it would be great and much more optimized.

However, in my last submission I didn't even use vectors (just normal arrays) and made use of running total as @tpb mentioned earlier, and it got "Accepted".

Regardless, it was a great lesson. Thanks for sharing your thoughts, really appreciate it.
Re-read tpb's post. No need to worry about memory allocation or context switches, if you implement it that way tpb says, each operation, whether type 1 or type 2, will require constant time.
@dhayden Yeah, exactly.
Last edited on
Topic archived. No new replies allowed.