Remove duplicated elements from a vector

Sep 11, 2009 at 8:14pm
I met a problem when I try to do this: removing the duplicated elements from a vector. I know there is a generic algorithm unique() working with sort() can remove consecutive duplicates in range from a vector. But the unique will keep the first element in each group of consecutive equal elements.

The thing I want is that if an elements showing up in vector more than once, it should be discarded.

For example, original vector = {1, 2, 3, 3, 4, 5, 4, 6, 7, 9, 9}, I want get the result = {1, 2, 5, 6, 7}.

I want to know if there is any efficient way to do that.

I try to use the iterator with the erase(), but I failed, system always report errors.(WinXP + Visual Studio)

1
2
3
4
5
6
7
8
9
for(it=vector.begin(); it<vector.end()-1; it++)
{
     if(*it == *(it+1))
     {
          vector.erase(it);
          vector.erase(it);
          it--;
     } 
}


Any suggestion is welcome. Thanks
Last edited on Sep 11, 2009 at 9:42pm
Sep 11, 2009 at 8:36pm
One quick way I could think off is, use a temp hash table. While iterating through vector, just check whether the current element of vector is present in the hash table. If it is not, then add it into the hash table with key as the current vector element and value as the count of that element. But if the value is present then just increment the value of that key in hash map. Once you are done with this,
just loop through the list of vectors again and check for which key the value is more than 1 in hash table. And delete that element from the vector.

Running time of this algorithm will still be O(n).

Sep 11, 2009 at 8:37pm
 
if(*it = *(it+1))


Are you doing assignment with =?
Sep 11, 2009 at 8:38pm
Hi, Robert, sorry, it is a typo, thanks
Sep 11, 2009 at 9:08pm
One way is:

1
2
std::vector<int> v; // assume v has the elements
std::set<int> s( v.begin(), v.end() );


And now s has the unique elements from v.

Runtime efficiency is O(n log(n))




Sep 11, 2009 at 9:10pm
Also

 
std::unique( v.begin(), v.end() );


if the duplicates are guaranteed to be contiguous (as OP's original code seemed to assume).
Runtime efficiency here is O(n).
Sep 11, 2009 at 9:19pm
Hi, jsmith, thanks for reply, but that is not want I want.

For example, the vector in my origian post is {1, 2, 3, 3, 4, 5, 4, 6, 7, 9, 9}, corresponding set will be {1, 2, 3, 4, 5, 6, 7, 9}, the result I want is {1, 2, 5, 6, 7}.

Meaning that I only want the elements which are only showing once.


Sep 11, 2009 at 10:34pm
So you only want the elements that are shown once in *both* of the sets?
Sep 11, 2009 at 10:35pm
1
2
3
4
5
6
7
8
9
for(it=vector.begin(); it<vector.end()-1; it++)
{
     if(*it == *(it+1))
     {
          vector.erase(it);
          vector.erase(it);
          it--;
     } 
}


It is also worth nothing the reason that this results in undefined behavior. After the first call to erase, the iterator is invalidated. vector::erase returns an iterator. You need to assign the return value to it if you want to end up with a valid iterator after the first erase call.

Moreover, in your example the duplicates aren't contiguous therefore the for loop concept wouldn't work anyhow.
Sep 11, 2009 at 10:41pm
Looks like kevinchkin is the only one that suggested a valid solution for what the OP wants. My thoughts were along similar lines. I'd use the foreach algorithm and create a functor that builds up the table. When finished you can use the table to remove all of the elements. You could even use the remove_if algorithm to operate on the same functor used by foreach.
Sep 12, 2009 at 12:56am
This is totally off the top of my head and uncompiled.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
template< typename Iter, typename Comp >
Iter find_if_except( Iter first, Iter last, Iter not_this_one, Comp compare )
{
    Iter res = std::find_if( first, last, compare );
    if( res == not_this_one ) {
        std::advance( res, 1 );
        res = std::find_if( res, last, compare );
    }

    return res;
}


std::vector<int> result;

for( std::vector<int>::const_iterator i = v.begin(); i != v.end(); ++i )
    if( find_if_except( v.begin(), v.end(), i, boost::lambda::_1 == *i ) == v.end() )
        result.push_back( *i );


Sep 12, 2009 at 2:02am
Very promising algorithm.

Iter looks like taking the form of std::vector<int>, so the "first", "last" and "not_this_one" seem to be Iter::iterator (i.e., std::vector<int>::iterator), this does not compile but illustrating the idea:
1
2
3
4
5
6
7
8
9
10
11
template< typename Iter, typename Comp >
Iter find_if_except( std::Iter::iterator first, std::Iter::iterator last, 
std::Iter::iterator not_this_one, Comp compare )
{
    Iter res = std::find_if( first, last, compare );
    if( res == not_this_one ) {
        std::advance( res, 1 );
        res = std::find_if( res, last, compare );
    }
    return res;
}


With jsmith's code above, I got the following compiling error:

 
error C2782: 'Iter find_if_except(Iter,Iter,Iter,Comp)' : template parameter 'Iter' is ambiguous
Last edited on Sep 12, 2009 at 2:07am
Sep 12, 2009 at 6:05am
This algorithm works:

1
2
3
4
5
6
7
8
9
10
11
(0) Get a std:vector<int> result; // store the result
(1) Get a std:vector<int> vec; // populated with N integers to be processed
(2) Get a std:vector<bool> duplicated(vec.size(), false); // initially every int is not duplicated.
(3) Assuming i and j be in the range of [0,vec.size() ), 
        for each vec.at(i) and vec.at(j) in the range of [0,vec.size() )
             if vec.at(i) == vec.at(j), 
               duplicated.at(i)=duplicated.at(j) = true; // assignment

(4) For i in the range of [0,vec.size) 
         if (!duplicated.at(i) ) 
              result.push_back(vec.at(i));


EDIT: For a vector of N completely unique integers, the two for statements make O(0)times of assignments on line 7) rather than O(N*N)in theory; for a vector of N completely duplicated integers, the two for statements make O(N-1) times of assignments (on line 7) rather than O(N*N) in theory. So it is an O(N) algorithm.
Last edited on Sep 12, 2009 at 1:26pm
Sep 12, 2009 at 11:22am
This algorithm works:

1
2
3
4
5
6
7
8
9
10
11
(0) Get a std:vector<int> result; // store the result
(1) Get a std:vector<int> vec; // populated with N integers to be processed
(2) Get a std:vector<bool> duplicated(vec.size(), false); // initially every int is not duplicated.
(3) Assuming i and j be in the range of [0,vec.size() ), 
        for each vec.at(i) and vec.at(j) in the range of [0,vec.size() )
             if vec.at(i) == vec.at(j), 
               duplicated.at(i)=duplicated.at(j) = true; // assignment

(4) For i in the range of [0,vec.size) 
         if (!duplicated.at(i) ) 
              result.push_back(vec.at(i));


EDIT: For a vector of N completely unique integers, the two for statements make O(0)times of assignments on line 7) rather than O(N*N)in theory; for a vector of N completely duplicated integers, the two for statements make O(N-1) times of assignments (on line 7) rather than O(N*N) in theory. So it is an O(N) algorithm.
Last edited on Sep 12, 2009 at 1:26pm
Sep 13, 2009 at 5:32pm
Still off the top of my head.

Probably my line 16 should change to be an iterator instead of const_iterator, because the
compiler is probably calling the non-const version of begin() and end(). Else explicitly
construct a const_iterator from the return of begin() and end().

That should take care of the compile error.

Topic archived. No new replies allowed.