Removing Duplicates

Pages: 12

Nov 22, 2016 at 10:39pm

The vector contains 1,4,9,16,9,7,4,9,11 im trying to remove the duplicates so that the vector only has 1,4,9,16,7,11. This is what i have so far.

 vector<int>num;
num.push_back(1);
num.push_back(4);
num.push_back(9);
num.push_back(16);
num.push_back(9);
num.push_back(7);
num.push_back(4);
num.push_back(9);
num.push_back(11);

for(unsigned int i=0;i<number.size()-1;i++){
    
    for(unsigned int x=0;x<i;x++){
        
         if(num[x]==num[i]){
      
           num.pop_back();
         }
}
}

Nov 22, 2016 at 11:12pm

closed account (E0p9LyTq)

A bit of C++11 overkill....

copy your vector contents into an unordered set, a container that can't have duplicated items. Clear your vector and push the contents of your unordered set back into your vector:

#include <iostream>
#include <vector>
#include <unordered_set>

int main()
{
   std::vector<int> num = {1, 4, 9, 16, 9, 7, 4, 9, 11 };
   
   std::cout << "num vector contains:\n";
   for (const int& x: num)
   {
      std::cout << " " << x;
   }
   std::cout << "\n";

   std::unordered_set<int> myset;

   myset.insert(num.begin(), num.end());
   
   while (!num.empty())
   {
      num.pop_back();
   }

   std::cout << "\nmyset contains:\n";
   for (const int& x: myset)
   {
      std::cout << " " << x;
      num.push_back(x);
   }
   std::cout << "\n";
   
   std::cout << "\nnum vector now contains:\n";
   for (const int& x: num)
   {
      std::cout << " " << x;
   }
   std::cout << "\n";
}

Edit & run on cpp.sh

num vector contains:
 1 4 9 16 9 7 4 9 11

myset contains:
 11 7 16 9 4 1

num vector now contains:
 11 7 16 9 4 1

Using a set container instead of an unordered set would sort the contents, low to high.

Last edited on Nov 22, 2016 at 11:18pm

Nov 23, 2016 at 1:01am

gunnerfunner (2127)

num.clear() would clear the vector in one go

Nov 23, 2016 at 1:51am

Handy Andy (5051)

Hello fivestar,

Another approach based on what you started with:

	for (size_t lpo = 0; lpo < num.size(); lpo++)
	{
		for (size_t lp = lpo + 1; lp < num.size(); lp++)  //  lp needs to stay 1 ahead of lpo.
		{
			if (num[lpo] == num[lp])
			{
				num.erase(num.begin() + lp);  //  Also resizes the vector.
			}
		}
	}

Also line one should be vector<int> num notice the space between > and n.

Hope that helps,

Andy

Forgot to mention num.pop_back() only removes the last element in the vector.

Last edited on Nov 23, 2016 at 1:58am

Nov 23, 2016 at 1:58am

JLBorges (13770)

#include <iostream>
#include <vector>
#include <algorithm>

int main()
{
    {
        // order of elements is not preserved
        std::vector<int> num = { 1, 4, 9, 16, 9, 7, 4, 9, 11 };
        for( int v : num ) std::cout << v << ' ' ;
        std::cout << '\n' ;

        std::sort( std::begin(num), std::end(num) ) ;
        // http://en.cppreference.com/w/cpp/algorithm/unique
        num.erase( std::unique( std::begin(num), std::end(num) ), std::end(num) ) ;

        for( int v : num ) std::cout << v << ' ' ;
        std::cout << '\n' ;
    }

    {
        // order of unique elements is preserved
        std::vector<int> num = { 1, 4, 9, 16, 9, 7, 4, 9, 11 };
        for( int v : num ) std::cout << v << ' ' ;
        std::cout << '\n' ;

        auto end_unique = std::end(num) ;
        for( auto iter = std::begin(num) ; iter != end_unique ; ++iter )
        {
            // http://en.cppreference.com/w/cpp/algorithm/remove
            end_unique = std::remove( iter+1, end_unique, *iter ) ;
        }
        num.erase( end_unique, std::end(num) ) ;

        for( int v : num ) std::cout << v << ' ' ;
        std::cout << '\n' ;
    }
}

Edit & run on cpp.sh

http://coliru.stacked-crooked.com/a/c7eb2072cac2022c

Nov 23, 2016 at 2:11am

liuyang (217)

Consider std::unique
It moves all the duplicated items to the end of the container, and returns an iterator.
All you need to do it to remove everything after that iterator.

Nov 23, 2016 at 2:58am

integralfx (897)

Consider std::unique
It moves all the duplicated items to the end of the container, and returns an iterator.
All you need to do it to remove everything after that iterator.

Removes all but the first element from every consecutive group of equivalent elements in the range [first,last).

http://www.cplusplus.com/reference/algorithm/unique/

Actually, you can just sort() it before calling unique(), though if OP wants to preserve the order of elements, this method won't work.

Last edited on Nov 23, 2016 at 2:59am

Nov 23, 2016 at 4:48am

gunnerfunner (2127)

Actually, you can just sort() it before calling unique(), though if OP wants to preserve the order of elements, this method won't work.

Agreed, the only solution that does everything is, as always, JLBorges'

Nov 23, 2016 at 10:08am

lastchance (6980)

Rather naive and "old school", but no re-ordering.

#include <iostream>
#include <vector>
using namespace std;


template <class T> void noDuplicates( vector<T> &v )       // Remove duplicates without reordering
{
   unsigned int i, x;
   bool newValue;

   i = 1;
   while ( i < v.size() )
   {   
      newValue = true;
      x = -1;
      while ( ++x < i && newValue )
      {
         if ( v[x] == v[i] )
         {
            v.erase( v.begin()+i );
            newValue = false;
         }
      }
      if ( newValue ) i++;
   }
}


int main()
{
   unsigned int i;
   // Initialise. Yes, I know you can initialise better with C++11, but it's just an example
   vector<int>num;
   int a[] = { 1, 4, 9, 16, 9, 9, 7, 4, 9, 11 };
   for ( i = 0; i < 10; i++ ) num.push_back( a[i] );

   noDuplicates<int>( num );
   
   for ( i = 0; i < num.size(); i++ ) cout << num[i] << endl;
}

Edit & run on cpp.sh

Last edited on Nov 23, 2016 at 10:09am

Nov 23, 2016 at 1:54pm

closed account (48T7M4Gy)

#include <iostream>
#include <vector>

int main()
{
    std::vector<int> num = {1, 4, 9, 9, 3, 3, 5, 16, 9, 9, 7, 4, 9, 11};
    
    //DISPLAY INITIAL VECTOR
    for (auto iter = num.begin(); iter != num.end(); iter++)
    {
        std::cout << *iter << ' ';
    }
    std::cout << "\n";
    
    //DETECT DUPLICATES
    for (auto iterA = num.begin(); iterA != num.end(); iterA++)
    {
        for (auto iterB = iterA + 1; iterB != num.end(); iterB++)
        {
            if(*iterB == *iterA)
            {
                num.erase(iterB);
                iterB--; //<--
            }
        }
    }
    
    //DISPLAY REVISED VECTOR
    for (auto iter = num.begin(); iter != num.end(); iter++)
    {
        std::cout << *iter << ' ';
    }
    std::cout << "\n";
    
    return 0;

}

Edit & run on cpp.sh

See:http://stackoverflow.com/questions/16476099/remove-duplicate-entries-in-a-c-vector

Last edited on Nov 23, 2016 at 1:57pm

Nov 23, 2016 at 4:55pm

lastchance (6980)

Thanks, Kemort. I really need to get to grips with the STL and use of iterators. My coding is still too much influenced by other programming languages.

Is there any significance to your highlighted line 23? (I'm not sure whether it was intended for me.)
iterB--; //<--
I don't need a similar line in my code because if I erase an element then my counter i will just stay the same and it will simply be looking at the next element.
if ( newValue ) i++; // i would stay the same if a duplicate element was found
I don't have to "back up" an iterator. I'm deliberately using a while loop, not a for loop, and, unlike the link you allude to, i won't change during this loop if an element has to be erased. (Feel free to give me an example of an array which won't work if I'm wrong here - happens a lot!)

Please let me know if there is a bug in my function routine. (I'm not bothered by main()).

Last edited on Nov 23, 2016 at 5:16pm

Nov 24, 2016 at 1:36am

closed account (48T7M4Gy)

Hi lastchance,

The iterB-- is crucial to the success of the duplicate removal.

With the OP sample vector your program, jlborges program and my program all give the same result.

However, I'm sorry to say, with the sample vector I use in my post jlborges and I get the same correct result while yours doesn't.

I haven't looked at yours carefully but a couple of points:
1. while loops and for loops are generally interchangeable - aside from just obviously being required to get the same answer. The choice of loop isn't significant. It's a matter of style.
2. So you can probably adjust by decrementing the iterator in a similar way to mine. I quickly tried without success, so it needs a little more time to resolve.
3. The adjustment must be made because of the way erase works. See http://www.cplusplus.com/reference/vector/vector/erase/

Because vectors use an array as their underlying storage, erasing elements in positions other than the vector end causes the container to relocate all the elements after the segment erased to their new positions.
...
Return value
An iterator pointing to the new location of the element that followed the last element erased by the function call.

The <algorithm> remove then erase functions evidently do the same but the decrement operation is built in - the nature/purpose of STL.

:)

Last edited on Nov 24, 2016 at 1:36am

Nov 24, 2016 at 2:37am

JLBorges (13770)

> The <algorithm> remove then erase functions evidently do the same

The typical implementation of std::remove() is somewhat more sophisticated;
see 'Possible implementation' in http://en.cppreference.com/w/cpp/algorithm/remove

The difference is remarkable when there are many repetitions (as in the snippet below).

#include <iostream>
#include <string>
#include <ctime>
#include <vector>
#include <algorithm>
#include <random>
#include <iomanip>

void crude_brute_force( std::vector<int>& num )
{
   for (auto iterA = num.begin(); iterA != num.end(); iterA++)
    {
        for (auto iterB = iterA + 1; iterB != num.end(); iterB++)
        {
            if(*iterB == *iterA)
            {
                num.erase(iterB);
                iterB--; //<--
            }
        }
    }
}

void std_algorithm_remove( std::vector<int>& num )
{
    auto end_unique = std::end(num) ;
    for( auto iter = std::begin(num) ; iter != end_unique ; ++iter )
    {
        // http://en.cppreference.com/w/cpp/algorithm/remove
        end_unique = std::remove( iter+1, end_unique, *iter ) ;
    }
    num.erase( end_unique, std::end(num) ) ;
}

int main()
{
    const std::size_t N = 1000 ;
    const int M = 300 ;

    std::vector<int> vec ;
    for( int i = 0 ; i < M ; ++i ) vec.insert( vec.end(), N, i ) ;
    std::shuffle( std::begin(vec), std::end(vec), std::mt19937{} ) ; // deliberately not seeded

    {
        auto cpy = vec ;
        const auto start = std::clock() ;
        crude_brute_force(cpy) ;
        const auto end = std::clock() ;
        std::cout << "   crude_brute_force: " << std::setw(5) << (end-start) * 1000.0 / CLOCKS_PER_SEC << " millisecs.\n" ;
    }

    {
        auto cpy = vec ;
        const auto start = std::clock() ;
        std_algorithm_remove(cpy) ;
        const auto end = std::clock() ;
        std::cout << "std_algorithm_remove: " << std::setw(5) << (end-start) * 1000.0 / CLOCKS_PER_SEC << " millisecs.\n" ;
    }
}

Edit & run on cpp.sh

echo -e '\n' && clang++ -std=c++14 -stdlib=libc++ -O3 -Wall -Wextra -pedantic-errors main.cpp -lsupc++ && ./a.out
echo -e '\n\n' && g++ -std=c++14 -O3 -Wall -Wextra -pedantic-errors main.cpp && ./a.out


   crude_brute_force:  6310 millisecs.
std_algorithm_remove:    70 millisecs.



   crude_brute_force:  6080 millisecs.
std_algorithm_remove:    60 millisecs.

http://coliru.stacked-crooked.com/a/a9c31e9e5e17758c

Nov 24, 2016 at 2:51am

lastchance (6980)

Hello Kemort

When you say that my function doesn't work with your vector, would you be able to post in the whole code that you used to check that. Note that, because of the non-C++11 way that I initialised my vector, you will have to increase the number of push_back in main (my line 35) to accommodate the length of your test vector. That is not related to my routine.

I appreciate your sentence underlined. However, the purpose of using the while loop was that, unlike a for loop, it doesn't automatically increment the count variable or iterator at the end of the loop.

Nov 24, 2016 at 3:01am

cire (8284)

I appreciate your sentence underlined. However, the purpose of using the while loop was that, unlike a for loop, it doesn't automatically increment the count variable or iterator at the end of the loop.

A for loop automatically performs whatever update code you've given it at the end of the loop. A while loop automatically performs whatever code you've given it in the condition at the top of the loop. With respect to finding duplicates, your "automatic" increment of the index in the conditional accomplishes the same effect as an "automatic" increment of an iterator at the end of a for loop iteration.

Nov 24, 2016 at 3:02am

closed account (48T7M4Gy)

This is what I ran on my XCode:

#include <iostream>
#include <vector>
using namespace std;


template <class T> void noDuplicates( vector<T> &v )       // Remove duplicates without reordering
{
    unsigned int i, x;
    bool newValue;
    
    i = 1;
    while ( i < v.size() )
    {
        newValue = true;
        x = -1;
        while ( ++x < i && newValue )
        {
            if ( v[x] == v[i] )
            {
                v.erase( v.begin()+i );
                newValue = false;
            }
        }
        if ( newValue ) i++;
    }
}


int main()
{
    unsigned int i;
    // Initialise. Yes, I know you can initialise better with C++11, but it's just an example
    vector<int>num;
    int a[] = { 1, 4, 9, 9, 3, 3, 5, 16, 9, 9, 7, 4, 9, 11 }; // <-- *** ONLY CHANGE ***
    for ( i = 0; i < 10; i++ ) num.push_back( a[i] );
    
    noDuplicates<int>( num );
    
    for ( i = 0; i < num.size(); i++ ) cout << num[i] << endl;
}

Edit & run on cpp.sh


1
4
9
3
5
16
Program ended with exit code: 0

You are right about the vector intialisation. If it is initialised thus:

int main()
{
    unsigned int i;
    // Initialise. Yes, I know you can initialise better with C++11, but it's just an example
    vector<int>num = { 1, 4, 9, 9, 3, 3, 5, 16, 9, 9, 7, 4, 9, 11 };
    
    noDuplicates<int>( num );
    
    for ( i = 0; i < num.size(); i++ ) cout << num[i] << endl;
}

We get:

1
4
9
3
5
16
7
11
Program ended with exit code: 0

Nov 24, 2016 at 3:10am

lastchance (6980)

Kemort

Please read my post. You only allowed for 10 push_back in main, so your vector isn't complete. That needs to be increased to 14 in my line 35. I have checked it in the cpp shell and it then gives the correct answer.

Cire, please look at my line 24. I DON'T increment i if there is a duplicate; that is, if newValue goes false.

JLBorges, is there any chance of directly using my function in your timing routine? I am intrigued to know how a hard-coded loop compares with the STL.

Last edited on Nov 24, 2016 at 3:17am

Nov 24, 2016 at 3:22am

closed account (48T7M4Gy)

I'm not surprised to see evidence that the timings for the brute force vs <algorithm> approach are substantially different - the STL wasn't built for nothing.

As long as the answer is right, despite the obvious knowledge gain, the nominal 6 seconds of my life has not been a complete waste of time even though the sample vectors used by me have somewhat less than 1000 elements so the lost time is not quite that much, as significant as it is. :)

Nov 24, 2016 at 3:56am

closed account (48T7M4Gy)

lastchance
I think our posts have crossed but so there is no misunderstanding about where I am on this.

Please read my post.

I did.

You only allowed for 10 push_back in main, so your vector isn't complete.

I think I have answered that at http://www.cplusplus.com/forum/beginner/203059/#msg965754 I decided to initialize the vector in a consistent way but the nett result on that is the same.

That needs to be increased to 14 in my line 35.

Not the way I have done it.

I have checked it in the cpp shell

Excellent.

and it then gives the correct answer.

Just as I have done and written above.

Cheers

PS
I have convinced myself that your program solves the problem without the decrement I needed to use in mine.

I think the timings will be a substantially longer than the STL but it will be interesting to see where they fit.

You can download the source code for the STL which I have just done. My view is it is better perhaps to use it rather than fathom the code even though remove seems to have the decrement built in which is not unexpected by me.

Last edited on Nov 24, 2016 at 3:57am

Nov 24, 2016 at 4:07am

JLBorges (13770)

All three versions yield the same result.
http://coliru.stacked-crooked.com/a/84eb5d49b6a5bfe3

> is there any chance of directly using my function in your timing routine?
> I am intrigued to know how a hard-coded loop compares with the STL.

A carefully written hand coded loop would, in most cases, be as efficient as the equivalent standard algorithm.
Hint:
a. There is no need to call erase() every time a duplicate is found. We can overwrite the duplicate value, and then, at the very end, do a single bulk erase of the values at the back of the vector.
b. When a duplicate is found, we need to immediately move to the left only the values up to the next duplicate. This way, when an nth duplicate is found, we can move the values to the right of it by n positions; this would reduce the number of moves.
c. With end_unique = std::remove( iter+1, end_unique, *iter ) ;, the number of values to be scanned for duplicates is reduced; the elements already marked for removal are excluded from subsequent scans. This too can be done if the call to std::remove is replaced with a hand coded loop.
c. C++11: Since the values at the back are to be removed, the only requirement is that they be safely destructible. So the values can be moved to the left (instead of copied, which needs to preserve the original value). (This does not make any difference when the values are of type int).

Anyway, here are some numbers:

      version_one_kemort:  2760 millisecs.
  version_two_std_remove:    50 millisecs.
version_three_lastchance:  5660 millisecs.



      version_one_kemort:  2860 millisecs.
  version_two_std_remove:    30 millisecs.
version_three_lastchance:  5460 millisecs.

http://coliru.stacked-crooked.com/a/efca89f08ef4a835

Pages: 12

C++

Forum

Removing Duplicates