Removing Duplicates

Pages: 12
The vector contains 1,4,9,16,9,7,4,9,11 im trying to remove the duplicates so that the vector only has 1,4,9,16,7,11. This is what i have so far.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
 vector<int>num;
num.push_back(1);
num.push_back(4);
num.push_back(9);
num.push_back(16);
num.push_back(9);
num.push_back(7);
num.push_back(4);
num.push_back(9);
num.push_back(11);

for(unsigned int i=0;i<number.size()-1;i++){
    
    for(unsigned int x=0;x<i;x++){
        
         if(num[x]==num[i]){
      
           num.pop_back();
         }
}
}
closed account (E0p9LyTq)
A bit of C++11 overkill....

copy your vector contents into an unordered set, a container that can't have duplicated items. Clear your vector and push the contents of your unordered set back into your vector:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#include <iostream>
#include <vector>
#include <unordered_set>

int main()
{
   std::vector<int> num = {1, 4, 9, 16, 9, 7, 4, 9, 11 };
   
   std::cout << "num vector contains:\n";
   for (const int& x: num)
   {
      std::cout << " " << x;
   }
   std::cout << "\n";

   std::unordered_set<int> myset;

   myset.insert(num.begin(), num.end());
   
   while (!num.empty())
   {
      num.pop_back();
   }

   std::cout << "\nmyset contains:\n";
   for (const int& x: myset)
   {
      std::cout << " " << x;
      num.push_back(x);
   }
   std::cout << "\n";
   
   std::cout << "\nnum vector now contains:\n";
   for (const int& x: num)
   {
      std::cout << " " << x;
   }
   std::cout << "\n";
}

num vector contains:
 1 4 9 16 9 7 4 9 11

myset contains:
 11 7 16 9 4 1

num vector now contains:
 11 7 16 9 4 1

Using a set container instead of an unordered set would sort the contents, low to high.
Last edited on
num.clear() would clear the vector in one go
Hello fivestar,

Another approach based on what you started with:

1
2
3
4
5
6
7
8
9
10
	for (size_t lpo = 0; lpo < num.size(); lpo++)
	{
		for (size_t lp = lpo + 1; lp < num.size(); lp++)  //  lp needs to stay 1 ahead of lpo.
		{
			if (num[lpo] == num[lp])
			{
				num.erase(num.begin() + lp);  //  Also resizes the vector.
			}
		}
	}


Also line one should be vector<int> num notice the space between > and n.

Hope that helps,

Andy

Forgot to mention num.pop_back() only removes the last element in the vector.
Last edited on
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#include <iostream>
#include <vector>
#include <algorithm>

int main()
{
    {
        // order of elements is not preserved
        std::vector<int> num = { 1, 4, 9, 16, 9, 7, 4, 9, 11 };
        for( int v : num ) std::cout << v << ' ' ;
        std::cout << '\n' ;

        std::sort( std::begin(num), std::end(num) ) ;
        // http://en.cppreference.com/w/cpp/algorithm/unique
        num.erase( std::unique( std::begin(num), std::end(num) ), std::end(num) ) ;

        for( int v : num ) std::cout << v << ' ' ;
        std::cout << '\n' ;
    }

    {
        // order of unique elements is preserved
        std::vector<int> num = { 1, 4, 9, 16, 9, 7, 4, 9, 11 };
        for( int v : num ) std::cout << v << ' ' ;
        std::cout << '\n' ;

        auto end_unique = std::end(num) ;
        for( auto iter = std::begin(num) ; iter != end_unique ; ++iter )
        {
            // http://en.cppreference.com/w/cpp/algorithm/remove
            end_unique = std::remove( iter+1, end_unique, *iter ) ;
        }
        num.erase( end_unique, std::end(num) ) ;

        for( int v : num ) std::cout << v << ' ' ;
        std::cout << '\n' ;
    }
}

http://coliru.stacked-crooked.com/a/c7eb2072cac2022c
Consider std::unique
It moves all the duplicated items to the end of the container, and returns an iterator.
All you need to do it to remove everything after that iterator.
Consider std::unique
It moves all the duplicated items to the end of the container, and returns an iterator.
All you need to do it to remove everything after that iterator.
Removes all but the first element from every consecutive group of equivalent elements in the range [first,last).

http://www.cplusplus.com/reference/algorithm/unique/


Actually, you can just sort() it before calling unique(), though if OP wants to preserve the order of elements, this method won't work.
Last edited on
Actually, you can just sort() it before calling unique(), though if OP wants to preserve the order of elements, this method won't work.


Agreed, the only solution that does everything is, as always, JLBorges'
Rather naive and "old school", but no re-ordering.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#include <iostream>
#include <vector>
using namespace std;


template <class T> void noDuplicates( vector<T> &v )       // Remove duplicates without reordering
{
   unsigned int i, x;
   bool newValue;

   i = 1;
   while ( i < v.size() )
   {   
      newValue = true;
      x = -1;
      while ( ++x < i && newValue )
      {
         if ( v[x] == v[i] )
         {
            v.erase( v.begin()+i );
            newValue = false;
         }
      }
      if ( newValue ) i++;
   }
}


int main()
{
   unsigned int i;
   // Initialise. Yes, I know you can initialise better with C++11, but it's just an example
   vector<int>num;
   int a[] = { 1, 4, 9, 16, 9, 9, 7, 4, 9, 11 };
   for ( i = 0; i < 10; i++ ) num.push_back( a[i] );

   noDuplicates<int>( num );
   
   for ( i = 0; i < num.size(); i++ ) cout << num[i] << endl;
}
Last edited on
closed account (48T7M4Gy)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#include <iostream>
#include <vector>

int main()
{
    std::vector<int> num = {1, 4, 9, 9, 3, 3, 5, 16, 9, 9, 7, 4, 9, 11};
    
    //DISPLAY INITIAL VECTOR
    for (auto iter = num.begin(); iter != num.end(); iter++)
    {
        std::cout << *iter << ' ';
    }
    std::cout << "\n";
    
    //DETECT DUPLICATES
    for (auto iterA = num.begin(); iterA != num.end(); iterA++)
    {
        for (auto iterB = iterA + 1; iterB != num.end(); iterB++)
        {
            if(*iterB == *iterA)
            {
                num.erase(iterB);
                iterB--; //<--
            }
        }
    }
    
    //DISPLAY REVISED VECTOR
    for (auto iter = num.begin(); iter != num.end(); iter++)
    {
        std::cout << *iter << ' ';
    }
    std::cout << "\n";
    
    return 0;

}


See:http://stackoverflow.com/questions/16476099/remove-duplicate-entries-in-a-c-vector
Last edited on
Thanks, Kemort. I really need to get to grips with the STL and use of iterators. My coding is still too much influenced by other programming languages.

Is there any significance to your highlighted line 23? (I'm not sure whether it was intended for me.)
iterB--; //<--
I don't need a similar line in my code because if I erase an element then my counter i will just stay the same and it will simply be looking at the next element.
if ( newValue ) i++; // i would stay the same if a duplicate element was found
I don't have to "back up" an iterator. I'm deliberately using a while loop, not a for loop, and, unlike the link you allude to, i won't change during this loop if an element has to be erased. (Feel free to give me an example of an array which won't work if I'm wrong here - happens a lot!)

Please let me know if there is a bug in my function routine. (I'm not bothered by main()).
Last edited on
closed account (48T7M4Gy)
Hi lastchance,

The iterB-- is crucial to the success of the duplicate removal.

With the OP sample vector your program, jlborges program and my program all give the same result.

However, I'm sorry to say, with the sample vector I use in my post jlborges and I get the same correct result while yours doesn't.

I haven't looked at yours carefully but a couple of points:
1. while loops and for loops are generally interchangeable - aside from just obviously being required to get the same answer. The choice of loop isn't significant. It's a matter of style.
2. So you can probably adjust by decrementing the iterator in a similar way to mine. I quickly tried without success, so it needs a little more time to resolve.
3. The adjustment must be made because of the way erase works. See http://www.cplusplus.com/reference/vector/vector/erase/
Because vectors use an array as their underlying storage, erasing elements in positions other than the vector end causes the container to relocate all the elements after the segment erased to their new positions.
...
Return value
An iterator pointing to the new location of the element that followed the last element erased by the function call.


The <algorithm> remove then erase functions evidently do the same but the decrement operation is built in - the nature/purpose of STL.

:)
Last edited on
> The <algorithm> remove then erase functions evidently do the same

The typical implementation of std::remove() is somewhat more sophisticated;
see 'Possible implementation' in http://en.cppreference.com/w/cpp/algorithm/remove

The difference is remarkable when there are many repetitions (as in the snippet below).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
#include <iostream>
#include <string>
#include <ctime>
#include <vector>
#include <algorithm>
#include <random>
#include <iomanip>

void crude_brute_force( std::vector<int>& num )
{
   for (auto iterA = num.begin(); iterA != num.end(); iterA++)
    {
        for (auto iterB = iterA + 1; iterB != num.end(); iterB++)
        {
            if(*iterB == *iterA)
            {
                num.erase(iterB);
                iterB--; //<--
            }
        }
    }
}

void std_algorithm_remove( std::vector<int>& num )
{
    auto end_unique = std::end(num) ;
    for( auto iter = std::begin(num) ; iter != end_unique ; ++iter )
    {
        // http://en.cppreference.com/w/cpp/algorithm/remove
        end_unique = std::remove( iter+1, end_unique, *iter ) ;
    }
    num.erase( end_unique, std::end(num) ) ;
}

int main()
{
    const std::size_t N = 1000 ;
    const int M = 300 ;

    std::vector<int> vec ;
    for( int i = 0 ; i < M ; ++i ) vec.insert( vec.end(), N, i ) ;
    std::shuffle( std::begin(vec), std::end(vec), std::mt19937{} ) ; // deliberately not seeded

    {
        auto cpy = vec ;
        const auto start = std::clock() ;
        crude_brute_force(cpy) ;
        const auto end = std::clock() ;
        std::cout << "   crude_brute_force: " << std::setw(5) << (end-start) * 1000.0 / CLOCKS_PER_SEC << " millisecs.\n" ;
    }

    {
        auto cpy = vec ;
        const auto start = std::clock() ;
        std_algorithm_remove(cpy) ;
        const auto end = std::clock() ;
        std::cout << "std_algorithm_remove: " << std::setw(5) << (end-start) * 1000.0 / CLOCKS_PER_SEC << " millisecs.\n" ;
    }
}

echo -e '\n' && clang++ -std=c++14 -stdlib=libc++ -O3 -Wall -Wextra -pedantic-errors main.cpp -lsupc++ && ./a.out
echo -e '\n\n' && g++ -std=c++14 -O3 -Wall -Wextra -pedantic-errors main.cpp && ./a.out


   crude_brute_force:  6310 millisecs.
std_algorithm_remove:    70 millisecs.



   crude_brute_force:  6080 millisecs.
std_algorithm_remove:    60 millisecs.

http://coliru.stacked-crooked.com/a/a9c31e9e5e17758c
Hello Kemort

When you say that my function doesn't work with your vector, would you be able to post in the whole code that you used to check that. Note that, because of the non-C++11 way that I initialised my vector, you will have to increase the number of push_back in main (my line 35) to accommodate the length of your test vector. That is not related to my routine.

I appreciate your sentence underlined. However, the purpose of using the while loop was that, unlike a for loop, it doesn't automatically increment the count variable or iterator at the end of the loop.
I appreciate your sentence underlined. However, the purpose of using the while loop was that, unlike a for loop, it doesn't automatically increment the count variable or iterator at the end of the loop.

A for loop automatically performs whatever update code you've given it at the end of the loop. A while loop automatically performs whatever code you've given it in the condition at the top of the loop. With respect to finding duplicates, your "automatic" increment of the index in the conditional accomplishes the same effect as an "automatic" increment of an iterator at the end of a for loop iteration.
closed account (48T7M4Gy)
This is what I ran on my XCode:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#include <iostream>
#include <vector>
using namespace std;


template <class T> void noDuplicates( vector<T> &v )       // Remove duplicates without reordering
{
    unsigned int i, x;
    bool newValue;
    
    i = 1;
    while ( i < v.size() )
    {
        newValue = true;
        x = -1;
        while ( ++x < i && newValue )
        {
            if ( v[x] == v[i] )
            {
                v.erase( v.begin()+i );
                newValue = false;
            }
        }
        if ( newValue ) i++;
    }
}


int main()
{
    unsigned int i;
    // Initialise. Yes, I know you can initialise better with C++11, but it's just an example
    vector<int>num;
    int a[] = { 1, 4, 9, 9, 3, 3, 5, 16, 9, 9, 7, 4, 9, 11 }; // <-- *** ONLY CHANGE ***
    for ( i = 0; i < 10; i++ ) num.push_back( a[i] );
    
    noDuplicates<int>( num );
    
    for ( i = 0; i < num.size(); i++ ) cout << num[i] << endl;
}



1
4
9
3
5
16
Program ended with exit code: 0


You are right about the vector intialisation. If it is initialised thus:

1
2
3
4
5
6
7
8
9
10
int main()
{
    unsigned int i;
    // Initialise. Yes, I know you can initialise better with C++11, but it's just an example
    vector<int>num = { 1, 4, 9, 9, 3, 3, 5, 16, 9, 9, 7, 4, 9, 11 };
    
    noDuplicates<int>( num );
    
    for ( i = 0; i < num.size(); i++ ) cout << num[i] << endl;
}


We get:
1
4
9
3
5
16
7
11
Program ended with exit code: 0


Kemort

Please read my post. You only allowed for 10 push_back in main, so your vector isn't complete. That needs to be increased to 14 in my line 35. I have checked it in the cpp shell and it then gives the correct answer.

Cire, please look at my line 24. I DON'T increment i if there is a duplicate; that is, if newValue goes false.

JLBorges, is there any chance of directly using my function in your timing routine? I am intrigued to know how a hard-coded loop compares with the STL.
Last edited on
closed account (48T7M4Gy)
I'm not surprised to see evidence that the timings for the brute force vs <algorithm> approach are substantially different - the STL wasn't built for nothing.

As long as the answer is right, despite the obvious knowledge gain, the nominal 6 seconds of my life has not been a complete waste of time even though the sample vectors used by me have somewhat less than 1000 elements so the lost time is not quite that much, as significant as it is. :)
closed account (48T7M4Gy)
lastchance
I think our posts have crossed but so there is no misunderstanding about where I am on this.

Please read my post.
I did.

You only allowed for 10 push_back in main, so your vector isn't complete.
I think I have answered that at http://www.cplusplus.com/forum/beginner/203059/#msg965754 I decided to initialize the vector in a consistent way but the nett result on that is the same.

That needs to be increased to 14 in my line 35.

Not the way I have done it.

I have checked it in the cpp shell
Excellent.

and it then gives the correct answer.
Just as I have done and written above.

Cheers

PS
I have convinced myself that your program solves the problem without the decrement I needed to use in mine.

I think the timings will be a substantially longer than the STL but it will be interesting to see where they fit.

You can download the source code for the STL which I have just done. My view is it is better perhaps to use it rather than fathom the code even though remove seems to have the decrement built in which is not unexpected by me.
Last edited on
All three versions yield the same result.
http://coliru.stacked-crooked.com/a/84eb5d49b6a5bfe3


> is there any chance of directly using my function in your timing routine?
> I am intrigued to know how a hard-coded loop compares with the STL.

A carefully written hand coded loop would, in most cases, be as efficient as the equivalent standard algorithm.
Hint:
a. There is no need to call erase() every time a duplicate is found. We can overwrite the duplicate value, and then, at the very end, do a single bulk erase of the values at the back of the vector.
b. When a duplicate is found, we need to immediately move to the left only the values up to the next duplicate. This way, when an nth duplicate is found, we can move the values to the right of it by n positions; this would reduce the number of moves.
c. With end_unique = std::remove( iter+1, end_unique, *iter ) ;, the number of values to be scanned for duplicates is reduced; the elements already marked for removal are excluded from subsequent scans. This too can be done if the call to std::remove is replaced with a hand coded loop.
c. C++11: Since the values at the back are to be removed, the only requirement is that they be safely destructible. So the values can be moved to the left (instead of copied, which needs to preserve the original value). (This does not make any difference when the values are of type int).

Anyway, here are some numbers:
      version_one_kemort:  2760 millisecs.
  version_two_std_remove:    50 millisecs.
version_three_lastchance:  5660 millisecs.



      version_one_kemort:  2860 millisecs.
  version_two_std_remove:    30 millisecs.
version_three_lastchance:  5460 millisecs.

http://coliru.stacked-crooked.com/a/efca89f08ef4a835
Pages: 12