find algorithm/solution

Pages: 12
Write a function: int solution(vector<int> &A);
that, given an array A of N integers, returns the smallest positive integer (greater than 0) that does not occur in A.
For example, given A = [1, 3, 6, 4, 1, 2], the function should return 5.
Given A = [1, 2, 3], the function should return 4.
Given A = [−1, −3], the function should return 1.
Write an efficient algorithm for the following assumptions:
N is an integer within the range [1..100,000];
each element of array A is an integer within the range [−1,000,000..1,000,000].


I wrote this:
1
2
3
4
5
6
7
8
9
#include <iostream>
#include <algorithm> 

int solution(vector<int> &A) {
    int i {1};
    while(true) 
    if(std::find(begin(A), end(A), i++) == end(A)) 
    return i-1;
}


I guess its performance is not good.
How correct is it to you? How would you write a solution for the exercise?
Last edited on
Set an array of N+2 bools, initialised to true (say).

Loop through the array A, switching the relevant bool to false if the element of A lies between 0 and N+1 inclusive.

Then loop through your array of bools from [1] to [N+1], returning the index of the first one that is true.

For "array", read vector.
Last edited on
Like so:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#include <iostream>
#include <vector>
using namespace std;

int solver( const vector<int> &A )
{
   int N = A.size();
   vector<bool> mask( N + 2, true );
   for ( int e : A ) if ( 0 <= e && e <= N + 1 ) mask[e] = false;
   for ( int i = 1; ; i++ ) if ( mask[i] ) return i;
}

int main()
{
   cout << solver( { 1, 3, 6, 4, 1, 2 } ) << '\n';
   cout << solver( { 1, 2, 3 } ) << '\n';
   cout << solver( { -1, -3 } ) << '\n';
}


5
4
1
> Write an efficient algorithm

Space efficient? Is the vector mutable (are we allowed to sort it in place?)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
#include <iostream>
#include <vector>
#include <cmath>
#include <algorithm>
using namespace std;

//======================================================================

// (1) Use a mask array.
//     Time O(N)
//     Efficient because a vector<bool> is about 1/32 the size of a vector<int>
//

int solution1( vector<int> &A )
{
   int N = A.size();
   vector<bool> mask( N + 2, true );
   for ( int e : A ) if ( 0 <= e && e <= N + 1 ) mask[e] = false;
   for ( int i = 1; ; i++ ) if ( mask[i] ) return i;
}

//======================================================================

// (2) Use the sign of A[] as the mask array.
//     Time O(N).
//     No extra space, but does change elements of A

int solution2( vector<int> &A )
{
   for ( int &e : A ) if ( e < 1 ) e = 0;        // Only positive retained
   for ( int i = 0; i < A.size(); i++ )          // Now use negative to indicate "taken"
   {
      int target = abs( A[i] );
      if ( target > 0 && target <= A.size() ) A[target-1] = A[target-1] ? -abs( A[target-1] ) : -1;
   }
   for ( int i = 1; i <= A.size(); i++ ) if ( A[i-1] >= 0 ) return i;
   return A.size() + 1;
}

//======================================================================

// (3) Sort A[] first
//     Time O(N logN)
//     No extra store. Rearranges A.

int solution3( vector<int> &A )
{
   sort( A.begin(), A.end() );
   auto it = A.begin();
   for ( int i = 1; ; i++ )
   {
      it = find( it, A.end(), i );
      if ( it == A.end() ) return i;
   }
}

//======================================================================

int main()
{
   vector<vector<int>> tests = { { 1, 3, 6, 4, 1, 2 }, { 1, 2, 3 }, { -1, -3 } };
   for ( auto &A : tests )
   {
      auto B = A, C = A;
      cout << solution1( A ) << '\t' << solution2( B ) << '\t' << solution3( C ) << '\n';
   }
}



I still prefer solution1 (extra bool array) - negligible proportional increase in storage and time complexity is O(N).

Obviously, you could pass by value rather than reference to avoid changing the original. Depends how much freedom you have with the function signature.
Last edited on
This algorithm (the counting sort) can be used for many, many problems and is important to add to your list of things you want to use (similar to binary search and other critical algorithms).
@JLBorges
>Space efficient? Is the vector mutable (are we allowed to sort it in place?)
It's not been stated. So create an algorithm as efficient as possible in any respect.

@lastchance
What is my algorithm's time performance (big O), please?
@frek, your algorithm has a time complexity O(N^2).
Last edited on
Thank you for your algorithms. But how did you find the first algorithm (O(N)), please? Did you already work on the same exercise?
@frek, I've never seen this exercise before. However, it is similar in essence to many integer-based problems, as @jonnin pointed out (although it's a simple ticking-off exercise here, not a sort, and hence can use space-efficient vector<bool>).

I'm not sure if if you are asking me to justify that my first algorithm is O(N), but it consists of a linear pass through an array of N numbers (so proportional to N), and a pass through at most N+1 elements, again asymptotically proportional to N. I suppose that there is also an initial assignment of just over N bools.

All stages are asymptotically proportional to N, so the whole scheme is O(N) in time (and, from the size of the extra array, in space).

Your method is, essentially, a nested loop, proportional to N actions in each loop, so would be O(N^2).
>I'm not sure if if you are asking me to justify that my first algorithm is O(N)
No, I just like to know where you brought the idea from? Or how could you find that algorithm for the exercise? Of course, I suppose the answer to both these question is practice. When someone practices well, brilliant ideas to solving exercies come to their mind fairly quickly. (I've experienced that).

> as @jonnin pointed out.

> Your method is, essentially, a nested loop, proportional to N actions in each loop, so would be O(N^2).

This part:
1
2
3
4
5
6
auto it = A.begin();
   for ( int i = 1; ; i++ )
   {
      it = find( it, A.end(), i );
      if ( it == A.end() ) return i;
   }

is resembling my code very much:

1
2
3
4
5
for ( int i = 1; i++ )  // equivalent to while(true) and one increment in each cycle 
   {
      if (find( A.begin(), A.end(), i ) == A.end())
      return i;
   }

Apart from the part for sorting the array.
If the array is by default {1, 2, 3, 4, ..., n-2, n}, the sort does almost nothing, and the find algorithm needs to iterate the array from begin through (only one less than) the end. And in measuring big O, we need to consider the worst case.
So if I'm not mistaken, the time complexity of that algorithm must be more than mine which was O(N^2).
Last edited on
No, the time complexity of std::sort is O(N log N).
Then I do a linear pass through the sorted array, which is O(N). This is less than O(N log N), so the overall complexity is O(N log N).

Your search part differs from mine in one crucial way ... I only go FORWARDS from the last iterator value (because I know that the array is now sorted). This makes my search stage O(N). However, you go back to the BEGINNING of the array every time (which you have to do because you cannot guarantee sortedness). My solution3, with sorting, is O(N log N) in time. Your algorithm is O(N^2).

Have you tried out these algorithms on whatever online tester you got the problem from?
Last edited on
If the array is by default {1, 2, 3, 4, ..., n-2, n}, the sort does almost nothing,


there are sorts that run faster if the data was nearly sorted to start. I do not know if std sort does or not, but in the case where this is true, the end result is O(N) for the sort. The problem is how you define 'nearly sorted'. It normally refers to data that was sorted and then modified slightly, a few elements changed and then you need to re-sort it, but 'a few' is subjective.

Credit: I didn't say that, lastchance did.

The problem can be solved in O(N) for speed. I am pretty sure that is as good as it gets -- you have to touch each item once.
Consider this given array {-1, -2, -3, ... , -n, 2} as well, if possible.

>I only go FORWARDS, ..., you go back to the BEGINNING
But you wrote:
find( it, A.end(), i ); // find( A.begin(), A.end(), i );
It seems to start from begining for each search. Probably you should have incremented the first iterator: find( it++, A.end(), i );

>Have you tried out these algorithms on whatever online tester you got the problem from?
I only tried my first algorithm which was good but not very good. It gives random exercises so I don't suppose it's possible to reach that exact exercise simply.
It does NOT start from the beginning each time. It starts from the variable 'it' (the last position of the iterator). Yours starts from A.begin().

No, I do not need your suggested change. My for-loop control already has an i++ in. I might choose to increment the iterator as well, avoiding a position that has already been tested, but it will make no difference to the time complexity.
Last edited on
Frek, in case you don't understand the lastchance's algorithm, here's an explanation.

N is the number of items in the array. Let's call R the largest value in thee array. The problem states that N is 1 to 100,000 and R is no more than 1,000,000.

The key insight is that you only need to check for values from 1 to N+1, not from 1 to R. Since there are only up to N values, at least one of the values from 1 to N+1 must not be in the array. So you simply setup a bool vector with N+2 items. Then make one pass through A. If A[i] is in the range 0 to N+2 then set the bit in the bool array. This part takes O(N) time.

Then make a pass through the bool array. The first value that isn't in the array is the solution to the problem. This takes at most O(N) time.
@dhayden
Thank you I already understood it. My question was something more like "how to be able to have our mind create correct solutions as quick as possible". This needs practice. :)

A binary gap within a positive integer N is any maximal sequence of consecutive zeros that is surrounded by ones at both ends in the binary representation of N.

For example, number 9 has binary representation 1001 and contains a binary gap of length 2. The number 529 has binary representation 1000010001 and contains two binary gaps: one of length 4 and one of length 3. The number 20 has binary representation 10100 and contains one binary gap of length 1. The number 15 has binary representation 1111 and has no binary gaps. The number 32 has binary representation 100000 and has no binary gaps.

Write a function:

int solution(int N);

that, given a positive integer N, returns the length of its longest binary gap. The function should return 0 if N doesn't contain a binary gap.

For example, given N = 1041 the function should return 5, because N has binary representation 10000010001 and so its longest binary gap is of length 5. Given N = 32 the function should return 0, because N has binary representation '100000' and thus no binary gaps.
Write an efficient algorithm for the following assumptions:
N is an integer within the range [1..2,147,483,647].


I wrote this, which is fine seemingly:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
int solution(int N) {

	auto binStr = std::bitset<32>(N).to_string();
	size_t count{ 0 }, temp{ 0 };

	for (size_t i = 0; i < binStr.size(); ++i)
		if (binStr[i] == '1' && i + 1 < binStr.size() && binStr[i + 1] == '0') {
			while (++i < binStr.size() && binStr[i] == '0')
				++temp;
			if (i < binStr.size() && binStr[i] == '1')
			{
				count = temp > count ? temp : count;
				temp = 0;
				--i;
			}
		}
	return count;
}


Any comments?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#include <iostream>
#include <algorithm>
using namespace std;

int solution( int N )
{
   int gap = 0, last = -1;
   for ( int i = 0; N; i++ )
   {
      if ( N % 2 )
      {
         if ( last >= 0 ) gap = max( i - last - 1, gap );
         last = i;
      }
      N /= 2;
   }
   return gap;
}


int main()
{
   for ( int N : { 9, 529, 20, 15, 32, 1041 } ) cout << N << " ---> " << solution( N ) << '\n';
}


9 ---> 2
529 ---> 4
20 ---> 1
15 ---> 0
32 ---> 0
1041 ---> 5
It's as efficient as mine but more modern with less number of code lines! Thanks.
Please tell me how do you think of the exercise that brilliant ideas on solving them come to your mind. I want to learn. Probably it's your first time facing such an exercise. For me too it was the first time, but my code has more lines and look normal while your professional. Thanks, again.
@Frek, your code and mine have the same time complexity, so "efficiency" isn't really relevant. Yours is a perfectly good way to solve the problem, and "counting lines" isn't a good comparator.

I code how I "see" the problem. The only thing which you won't see is that I developed the code by evolution. My first draft was the easy attack: I set up a vector to store the positions of the 1's. Then I worked out how to remove the vector, simplifying the solution.

I'm not a professional programmer, although I do write a lot of code to do my work and I prefer simplicity over "over-protectiveness" (the std:: and const splattered everywhere). But then I'm writing code to solve fluid-flow problems that only I and my postgrads will see the source code of; I would probably have a completely different perspective (and have to abide by an in-house coding style) if I were writing banking software.

Neither of us used comments - black marks for both of us. But I would encourage you to have a "complete" compileable program to test, especially as you have a list of test cases.

At the end of the day, the problems you posed in this thread are more mathematical challenges (and hence appealed to me) than "bread-and-butter" programming.
Last edited on
Pages: 12