How to make elements of vector unique? (remove non adjacent duplicates)

I have a vector containing few non-adjacent duplicates.

As a simple example, consider:

2 1 6 1 4 6 2 1 1

I am trying to make this vector unique by removing the non-adjacent duplicates and maintaining the order of elements.

Result would be:

2 1 6 4 

The solutions I tried are:

  1. Inserting into a std::set but the problem with this approach is that it will disturb the order of elements.
  2. Use the combination of std::sort and std::unique. But again same order problem.
  3. Manual duplicate elimination:

        Define a temporary vector TempVector.
        for (each element in a vector)
        {
            if (the element does not exists in TempVector)
            {
                add to TempVector;
            }
        }
        swap orginial vector with TempVector.
    

My question is:

Is there any STL algorithm which can remove the non-adjacent duplicates from the vector ? what is its complexity?


I think you would do it like this:

I would use two iterators on the vector :

The first of one reads the data and inserts it a temporary set.

When the read data was not in the set you copy it from the first iterator to the second and increment it.

At the end you keep only the data up to the second iterator.

The complexity is O( n .log( n ) ) as the lookup for duplicated elements uses the set, not the vector.

#include <vector>
#include <set>
#include <iostream>

int main(int argc, char* argv[])
{
    std::vector< int > k ;

    k.push_back( 2 );
    k.push_back( 1 );
    k.push_back( 6 );
    k.push_back( 1 );
    k.push_back( 4 );
    k.push_back( 6 );
    k.push_back( 2 );
    k.push_back( 1 );
    k.push_back( 1 );

{
    std::vector< int >::iterator r , w ;

    std::set< int > tmpset ;

    for( r = k.begin() , w = k.begin() ; r != k.end() ; ++r )
    {
        if( tmpset.insert( *r ).second )
        {
            *w++ = *r ;
        }
    }

    k.erase( w , k.end() );
}


    {
        std::vector< int >::iterator r ;

        for( r = k.begin() ; r != k.end() ; ++r )
        {
            std::cout << *r << std::endl ;
        }
    }
}

Without using a temporary set it's possible to do this with (possibly) some loss of performance:

template<class Iterator>
Iterator Unique(Iterator first, Iterator last)
{
    while (first != last)
    {
        Iterator next(first);
        last = std::remove(++next, last, *first);
        first = next;
    }

    return last;
}

used as in:

vec.erase( Unique( vec.begin(), vec.end() ), vec.end() );

For smaller data sets, the implementation simplicity and lack of extra allocation required may offset the theoretical higher complexity of using an additional set. Measurement with a representative input is the only way to be sure, though.