Working with array

Pages: 12
Task description:
A non-empty array A consisting of N integers is given.
The leader of this array is the value that occurs in more than half of the elements of A.
An equi leader is an index S such that 0 ≤ S < N − 1 and two sequences A[0], A[1], ..., A[S] and A[S + 1], A[S + 2], ..., A[N − 1] have leaders of the same value.
For example, given array A such that:
A[0] = 4
A[1] = 3
A[2] = 4
A[3] = 4
A[4] = 4
A[5] = 2
we can find two equi leaders:
0, because sequences: (4) and (3, 4, 4, 4, 2) have the same leader, whose value is 4.
2, because sequences: (4, 3, 4) and (4, 4, 2) have the same leader, whose value is 4.
The goal is to count the number of equi leaders.
Write a function that, given the example above, returns 2, as explained above.


Based on my understanding, the function should return 2 for the array {4, 4, 2, 5, 3, 4, 4, 4}. Correct?
Indices 0 and 6.
Last edited on
Based on my understanding, the function should return 2 for the array {4, 4, 2, 5, 3, 4, 4, 4}. Correct?
Indices 0 and 6.

What about index 2, i.e. sequences {4, 4, 2} and {5, 3, 4, 4, 4}?

(both have leader 4)

_______

In general, you should write a function that tests every index of the given array to check whether it is an "equi leader" or not. Sum up and return the number of times that the "equi leader" test was true.

For this it will probably be good to write a sub-function that returns the "leader" of a given sub-sequence:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#define NO_LEADER INT_MAX /* cf. https://youtu.be/TUXtA-6tHT4 */

int get_leader(const int *const array, const size_t index_start, const size_t index_end)
{
    /* ... */
}

int count_equi_leaders(const int *const array, const size_t len)
{
    int count = 0;
    for (size_t i = 0; i < len - 1; ++i)
    {
        const int leader_a = get_leader(array, 0, i);
        const int leader_b = get_leader(array, i+1, len-1);
        if ((leader_a != NO_LEADER) && (leader_a == leader_b))
        {
            ++count;
        }
    }
    return count;
}

(we reserve a special value, e.g. INT_MAX, for the case that the sequence has no leader)
Last edited on
Assuming "more than half" means "strictly greater than half" then there are 3 equi-leaders (S=0, S=2 and S=6)

Subsequences:
{4}{4,2,5,3,4,4,4} (giving 4 and 4 )
{4,4,2}{5,3,4,4,4} (giving 4 and 4 )
{4,4,2,5,3,4,4}{4} (giving 4 and 4 )

Your function should return 3.
{4,4,2,5,3,4}{4,4} ? there are a couple more like that?
Last edited on
jonnin wrote:
{4,4,2,5,3,4}{4,4} ? there are a couple more like that?


Your first subsequence doesn't have a leader: 3 occurrences doesn't EXCEED half of 6.
ah, strictly more than half, ok.
Yes, right. I wrote this O(n) time and O1) space:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
int solution(std::vector<int>& A) {
	if (A.size() <= 1) return 0;
	size_t count{ 0 };
	int dominator{ A[0] };

	for (size_t i = 0; i < A.size(); ++i) 
		if (A[i] == dominator)
			++count;
		else if (count > 0)
			--count;
		else {
			dominator = A[i];
			count = 1;
		}

	size_t dominater_cnt = std::count(A.begin(), A.end(), dominator);
        count = 0;
        size_t EquiLeader{ 0 };

	for (size_t i = 0; i < A.size(); ++i) {
		if (A[i] == dominator)
			++count;
		if (count > (i + 1) / 2   // check left side
		      && dominater_cnt - count > (A.size() - i - 1) / 2)  // then right side
			++EquiLeader;
	}
	return EquiLeader;
}

int main()
{
	std::vector<std::vector<int>> A{
		{4, 3, 3, 2, 3, -1, 3, 3}, // 3
		{1, 1, 3, 4, 3, 3, 3},      // 0
		{4, 3, 4, 4, 4, 2}           // 2 
	};
	
	for (auto& a : A) std::cout << solution(a) << '\n';
}
Last edited on
Nice code! It's almost like I wrote it...


@lastchance ;)


Brackets are killing me though.
@frek,
Your first loop calculates a leader if there is one ... but it finds a fairly arbitrary member of the sequence if there isn't. So for the sequence
{1, 2, 3, 1, 2, 3, 4}
the dominator would be calculated as 4. (Print it out and try it.) This isn't true, but it doesn't matter as the second loop won't be able to calculate any equi-leaders anyway. However, that second loop will still run to completion unnecessarily.

So you could avoid doing the second loop at all if you add the line
if ( dominater_cnt <= A.size() / 2 ) return 0;
immediately after calculating dominater_cnt. This will roughly half the number of operations in the case when the answer is 0 (which would be very common for a randomly generated sequence).


Your method is correctly O(n) - but given your (and @zapshe's) previous posts, I'm amused that you didn't put a constant before the 'n'.
Last edited on
Your method is correctly O(n) - but given your (and @zapshe's) previous posts, I'm amused that you didn't put a constant before the 'n'.

Lmao, it's big O notation, it's O(n) after reduction from O(3n). Nothing wrong with that when viewing through the lens of asymptotic time complexity. This system cares about expressing time complexity based on n -> infinity.

My argument has only been that the constant having no place in Big-O doesn't make it worthless.

The same way he has O(1) space complexity, but if an algorithm always allocated 1 billion elements, you'd reduce it to O(1) as if it doesn't matter. Even if just bools or chars, that would be 1GB.

In theory, you could be a hack and change algorithms such that they loop and allocate based off large constants rather than n so you could always create an O(1) space complexities and even create lower time complexities.

It wouldn't mean the constants are worthless, it simply means the constants are not affected by n which is the requirement of Big-O.
zapshe wrote:
My argument has only been that the constant having no place in Big-O doesn't make it worthless.


My contention would be that the constant is indeed worthless - it gives the user a false view of computing time. Some operations (or contents of loops) take substantially longer than others, and some, indeed, might be optimised out.

The advantage of just saying O(N) is the proportionality implied: if you time a code with N=10000 then you know it will take roughly 100 times longer if N=1000000 and you can then decide whether it is practical and whether or not to go and have a cup of coffee (or, for some of my codes, a good night's sleep).
Some operations (or contents of loops) take substantially longer than others, and some, indeed, might be optimised out.

The issue here is that this is still an issue with Big-O. O(n) vs O(n^2) doesn't tell you which algorithm is "faster". It only tells you that EVENTUALLY O(n) will be faster than O(n^2).

Or as my professor put it, since O(1.0000000000000001n) is exponential, it will eventually overtake O(n)... but it may take a very long time.


For all you know, the O(n^2) algorithm is much more optimized - where O(n) algorithm takes 1 second per loop while O(n^2) takes 1ms.

You can only compare algorithms via Big-O if they relatively do the same thing - such as sort a list - and are, ideally, well optimized for what they do.


With that in mind, a constant would not give a false view anymore than regular Big-O would.
Last edited on
@zapshe,
You didn't read my post.

I am NOT comparing O(N) with O(N^2). I am comparing the SAME algorithm with different values of N.

If you go from N=10000 to N=1000000 an O(N) algorithm will take about 100 times longer, whereas an O(N^2) algorithm will take 10000 times longer. Deciding to write O(3N) rather than O(N) doesn't change that proportionality at all - in other words, that 3 is completely and utterly irrelevant, as well as being utterly impossible to substantiate when different loops do different things and take different amounts of time.

Don't put those numerical multipliers in order relations - it is completely pointless!

In a previous post @frek decided that (1/2)N(N-1) operations wasn't O(N^2). He was simply wrong. Neither the 1/2 (nor the -1) enable you to predict the computation time.
@lastchance
I didn't add that line because the array is by default assumed to have a dominator element.
the constant matters when tweaking the code/approach. The constant does not matter when getting a general sense of what is possible. If the question is "can I do the travelling salesman in O(n)" its a totally different question than "can I reduce my algorithm from O(17.2 *N) to 10*N?". The former is what is theoretically possible for the problem at hand. The second is implementation details, really (possibly forced by language, hardware, etc).
jonnin wrote:
the constant matters when tweaking the code/approach


The constant is irrelevant. No meaningful value can be computed if the operations done in a particular loop differ. It does not in any way predict the compute time.

Sure, you can usually reduce the calculations within a loop and that will reduce the calculation time (for that particular loop) in proportion. But the constant has no meaningful value. So, it shouldn't be there.
Lets take something really simple:
standard sorted singly linked list, search for an item that isn't there and is bigger than the last value in the list. That is simply O(N), no constant in the way etc.

now take the same list and make it a skip-list with say increments of 100. The same search becomes O(N/100) approximately; you check every 100th until its between 2 of those and then linearly iterate the 100 between where it should be, right (or in this specific case, there may be an exit early without the iteration between, but nevermind that detail)? In theoretical big-O, the constant is not important, its still just O(N). But in practice, its almost 100 times faster as the list gets larger (and no better if the list is < 200 or so). You can surely predict it will run faster this way. The constant matters here, otherwise, the argument would be that a normal LL is just as good as a skiplist, which is not really true.

If your goal is to report the big-O, the constant does not belong. If you are trying to improve your code, or compare 2 things with equal big-O but different behaviors, it does make sense to mention it.
Last edited on
@jonnin, that’s not true.

If you started with 1.0N, then you go to (1/100)N.
But if you started with 3N, then you go to (3/100)N.

So what’s the constant? 1, 3, 1/100, 3/100, …? None of them - it’s utterly meaningless.

You do know, however, that the time is proportional to N. So, O(N) is all you can say: “linear time”.
its meaningful in context of the problem at hand. If you changed from N to 3N items, its still 1 and 1/100, as you have simply changed 10000 to 30000 for the N value. The constants are how the 2 algorithms compare against each other, for whatever N, changing N does not change their relative comparison.
-------------------------------------------------------------
Or, make it simpler, forget the constant then. Without using any constants, explain to me why a skip list is better than a linked list for searching. Or, explain why it isnt. Both are O(N), so they are the same, right?
I don't understand how there can be such a discrepancy between us.

Asymptotic time complexity doesn't care about constants, it cares about relationship to n as it goes to infinity.

As programmers, we should care about the constants. Sorting our list in O(100n) is vastly different than sorting it in O(n). Your argument that "100" doesn't give us any information sounds willfully ignorant. At the very least, you're incurring the overhead of looping 100*n more times.

You'd also be arguing that when timing O(100n) and O(n), you'd have no hint at all as to which may perform better - and indeed may be surprised to find a correlation between higher constants and longer run time.
Pages: 12