Check if a vector contain duplicate numbers

Pages: 12
If a vector contain duplicate numbers, return true, otherwise return false.

For example:
nums = [1,2,3,1]
true

nums = [1,2,3,4]
false


My code:
1
2
3
4
5
6
7
8
9
10
11
12
bool containsDuplicate(vector<int>& nums) {
    int counter = 0;

    for (int i = 0; i <= nums.size() - 2; i++)
        for (int j = 1; j <= nums.size() - 1; j++)
            if (nums[i] == nums[j]) counter++;

    if (counter == 0)
        return false;
    else
        return true;
}

I don't know what I did wrong in the program, my logic seems to correct, but the answer is wrong when I test it. Any help please ? Thank you.
Last edited on
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <vector>
#include <unordered_set>

bool contains_duplicate( const std::vector<int>& nums ) {

    for( std::size_t i = 0 ; i < nums.size() ; ++i )
       for( std::size_t j = i+1 ; j < nums.size() ; ++j ) // for each number after the one at i
           if( nums[i] == nums[j] ) return true ; // found a duplicate

    return false ;
}

bool contains_duplicate_2( const std::vector<int>& nums ) {
    
    return std::unordered_set<int>{ nums.begin(), nums.end() }.size() < nums.size() ;
}
If the vector is in sorted order (or can be sorted), then std::adjacent_find() could be used.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include <vector>
#include <algorithm>
#include <iostream>
#include <iomanip>

bool containsDuplicate(std::vector<int>& nums) {
	if (!std::is_sorted(nums.begin(), nums.end()))
		std::sort(nums.begin(), nums.end());

	return std::adjacent_find(nums.begin(), nums.end()) != nums.end();
}

int main() {
	std::vector v1 { 1, 2, 3, 1 };
	std::vector v2 { 1, 2, 3, 4 };

	std::cout << "v1 " << std::boolalpha << containsDuplicate(v1) << '\n';
	std::cout << "v2 " << std::boolalpha << containsDuplicate(v2) << '\n';
}



v1 true
v2 false

if the number of items can be quantified in a simple way, counting sort solves this in one pass.
basically
int ctsrt[number of items]{}; //eg, say your data is 0-5 or whatever, number of items is 6, right?
then
for(all the data && !hazdupz)
{
ctsrt[data]++;
if(ctsrt[data] > 1)
hazdupz = true;
}

counting sort can be replaced with a variety of things with O(1) lookup time if a C array offends you or if the data is not simply quantified. a derpy check can be used if it is simply quantifiable as well: if you have 0-5 possible values, and the vector size > 6, you must therefore have a duplicate. The reverse is not ensured, of course.
Last edited on
If the unique set method is used, then doing a checked insert loop is more efficient as it terminates when a duplicate is found - rather than building the entire set and then checking it's number of elements:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#include <vector>
#include <iostream>
#include <iomanip>
#include <unordered_set>

bool containsDuplicate(const std::vector<int>& nums) {
	std::unordered_set<int> si;

	for (const auto& n : nums)
		if (!si.insert(n).second)
			return true;

	return false;
}

int main() {
	std::vector v1 { 1, 2, 3, 1 };
	std::vector v2 { 1, 2, 3, 4 };

	std::cout << "v1 " << std::boolalpha << containsDuplicate(v1) << '\n';
	std::cout << "v2 " << std::boolalpha << containsDuplicate(v2) << '\n';
}

Last edited on
As jonnin says, if the range of the vector elements is constrained to be within a smallish range, then direct counting can be done. In this example, the range is restricted to simply a unit8_t type - which has a range of 0 - 255 (ie 256 elements):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include <vector>
#include <iostream>
#include <iomanip>

bool containsDuplicate(const std::vector<uint8_t>& nums) {
	size_t dups[256] {};

	for (const auto& n : nums)
		if (++dups[n] > 1)
			return true;

	return false;
}

int main() {
	std::vector<uint8_t> v1 { 1, 2, 3, 1 };
	std::vector<uint8_t> v2 { 1, 2, 3, 4 };

	std::cout << "v1 " << std::boolalpha << containsDuplicate(v1) << '\n';
	std::cout << "v2 " << std::boolalpha << containsDuplicate(v2) << '\n';
}

@ihavesmallbrain - is this just an exercise/test or is this for production code? If for production code, approx how many elements are we dealing with? ie is potential performance an issue?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#include <vector>
#include <iostream>
#include <iomanip>

bool containsDuplicate(const std::vector<int>& nums) {
	for (size_t i = 0; i < nums.size() - 1; ++i)
		for (size_t j = i + 1; j < nums.size(); ++j)
			if (nums[i] == nums[j])
				return true;

	return false;
}

int main() {
	std::vector v1 { 1, 2, 3, 1 };
	std::vector v2 { 1, 2, 3, 4, 4 };

	std::cout << "v1 " << std::boolalpha << containsDuplicate(v1) << '\n';
	std::cout << "v2 " << std::boolalpha << containsDuplicate(v2) << '\n';
}

Either:
1
2
3
4
5
6
7
8
9
10
11
bool containsDuplicate( const std::vector<int>& nums ) {

    if( nums.empty() ) return false ; // *** added  *** (avoid ub if vector is empty)

	for (size_t i = 0; i < nums.size() - 1; ++i)
		for (size_t j = i + 1; j < nums.size(); ++j)
			if (nums[i] == nums[j])
				return true;

	return false;
}


or:
1
2
3
4
5
6
7
8
9
bool containsDuplicate( const std::vector<int>& nums ) {

	for ( std::ptrdiff_t i = 0; i < std::ptrdiff_t( nums.size() ) - 1; ++i )
		for ( std::ptrdiff_t j = i+1; j < nums.size(); ++j )
			if (nums[i] == nums[j])
				return true;

	return false;
}
Sorting the vector and operating on it is O(n log n). Brute forcing the duplicates check is O(n^2), but may be faster for smaller n. As usual, would need to measure with real data for your use case. [And if the range of the numbers is bounded, then you could say it's O(n), to create and iterate over the necessary array.]
It doesn't have to be bounded to get N. Its just easier to explain and code it for beginners. Any O(1) retrieval approach can stow and count less friendly data.
There are constraints for the vector:
1 <= nums.length <= 105
-109 <= nums[i] <= 109

my approach is O(n^2) which is time limit exceeded, should I sort the vector first or use an unordered_set, which method is faster and use less space ?
Last edited on
Why do you guys want to know the size of the vector ? just wondering
> The tese cases are hidden so I don't know how big is the vector
So create your own large test cases.

Eg
1
2
std::vector<int> tests;
for ( int i = 0 ; i < 1E6 ; i++) tests.push_back(i);

This creates a test with no duplicates.

You can then mutate it with say to make two elements the same.
 
tests[1000] = tests[100000];


Both approaches presented above should be fine.

Sorting is perhaps simpler, as it doesn't involve creating new data, if you're allowed to modify the vector you're given. All the operations seem to be O(N) or O(NlogN), so basically O(NlogN).

The unordered set might not be as efficient.
https://en.cppreference.com/w/cpp/container/unordered_set/unordered_set
The complexity of the iterator used by JLBorges could have issues for very large vectors with no duplicates.

average case linear worst case quadratic in distance between first and last


-109 <= nums[i] <= 109


Then an int8_t type can be used:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include <vector>
#include <iostream>
#include <iomanip>

bool containsDuplicate(const std::vector<int8_t>& nums) {
	size_t dups[256] {};

	for (const auto& n : nums)
		if (++dups[n + 128] > 1)
			return true;

	return false;
}

int main() {
	std::vector<int8_t> v1 { 1, 2, 3, 1 };
	std::vector<int8_t> v2 { 1, 2, 3, 4 };

	std::cout << "v1 " << std::boolalpha << containsDuplicate(v1) << '\n';
	std::cout << "v2 " << std::boolalpha << containsDuplicate(v2) << '\n';
}

Last edited on
> 1 <= nums.length <= 105
> -109 <= nums[i] <= 109

A 'time limit exceeded' error with such a small vector!

It probably is
1 <= nums.length <= 105
-109 <= nums[i] <= 109
That sort of makes more sense...

Then that rules out the counting (too much space requirement) and the nested for-loop (too slow) methods.

That just leaves the set and the sort methods. For the sort method, you can assume that the vector isn't sorted and so leave out the is_sorted() test.

If you can modify the data, then I'd try the sort method first:

1
2
3
4
5
bool containsDuplicate(std::vector<int>& nums) {
	std::sort(nums.begin(), nums.end());

	return std::adjacent_find(nums.begin(), nums.end()) != nums.end();
}


If you can't modify the data, then you're left with the set method.
> If you can't modify the data, then you're left with the set method.

If we can't modify the data, we can copy it and sort the copy.
For 105 random integers, making a copy and sorting appears to be significantly faster on the average (libstdc++).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
#include <iostream>
#include <vector>
#include <random>
#include <unordered_set>
#include <algorithm>
#include <ctime>

bool contains_duplicate_sort( std::vector<int> nums ) {

    std::sort(nums.begin(), nums.end());
	return std::adjacent_find(nums.begin(), nums.end()) != nums.end();
}

bool contains_duplicate_set( const std::vector<int>& nums ) {

	std::unordered_set<int> si;

	for( const auto& n : nums ) if (!si.insert(n).second) return true;

	return false;
}

int main()
{
    const std::size_t N = 100'000 ; // 10^5 integers
    std::mt19937 rng( std::random_device{}() ) ;
    std::vector<int> numbers(N) ;
    
    double time_set = 0 ;
    double time_sort = 0 ;
    const int NRUNS = 100 ;
    
    for( int i = 0 ; i < NRUNS ; ++i )
    {
        for( auto& v : numbers ) v = int( rng() ) ; // fill with a fresh set of random integers 
    
        {
            const auto start = std::clock() ;
            volatile bool b = contains_duplicate_sort(numbers) ; // passed by value
            const auto end = std::clock() ;
            time_sort += (end-start) * 1000.0 / CLOCKS_PER_SEC ;
        }

        {
            const auto start = std::clock() ;
            volatile bool b = contains_duplicate_set(numbers) ; 
            const auto end = std::clock() ;
            time_set += (end-start) * 1000.0 / CLOCKS_PER_SEC ;
        }
    
    }
        std::cout << "sort: " << time_sort / NRUNS << " millisecs\n" 
                  << " set: " << time_set / NRUNS << " millisecs\n" ;
} 


g++ -std=c++20 -O3 -Wall -Wextra -pedantic-errors -Wno-unused-variable main.cpp && ./a.out
echo =====================================
clang++ -std=c++2a -O3 -Wall -Wextra -pedantic-errors -Wno-unused-variable main.cpp && ./a.out
sort: 6.94039 millisecs
 set: 17.4275 millisecs
=====================================
sort: 7.10618 millisecs
 set: 19.1317 millisecs

http://coliru.stacked-crooked.com/a/fa506d45b7aa05e3

Either should comfortably meet the performance constraint.
Yes. With VS2022 and Windows 7 on my laptop I get:


sort: 6.9  millisecs
 set: 8.02 millisecs


and when tried with std::set (instead of std::unordered_set), set is even slower (as expected).

I'm fairly surprised - as I was thinking that a copy and a sort would have been slower. But as with most things C++, don't assume and always measure!

NB. Passing by ref instead of by value gives me:


sort: 6.82 millisecs
 set: 8.12 millisecs


so the copying overhead is fairly minimal.

PS. For comparison purposes, using the nested for-loop method gives:


sort: 1640.17 millisecs
 set: 9.8 millisecs


which is approx 240 times slower!
Last edited on
Reminds me of a CppCon talk by Andrei Alexandrescu.
https://youtu.be/FJJTYQYB1JQ?t=1504
(In it, he calls "make heap" before calling insertion sort)
Last edited on
Pages: 12