List of all unique characters in a string?

I want to append characters to a string, but want to make sure all the letters in the final list are unique.

Example: "aaabcabccd""abcd"

Now of course I have two solutions in my mind. One is using a list that will map the characters with their ASCII codes. So whenever I encounter a letter it will set the index to True. Afterwards I will scan the list and append all the ones that were set. It will have a time complexity of O(n).

Another solution would be using a dict and following the same procedure. After mapping every char, I will do the operation for each key in the dictionary. This will have a linear running time as well.

Since I am a Python newbie, I was wondering which would be more space efficient. Which one could be implemented more efficiently?

PS: Order is not important while creating the list.


The simplest solution is probably:

In [10]: ''.join(set('aaabcabccd'))
Out[10]: 'acbd'

Note that this doesn't guarantee the order in which the letters appear in the output, even though the example might suggest otherwise.

You refer to the output as a "list". If a list is what you really want, replace ''.join with list:

In [1]: list(set('aaabcabccd'))
Out[1]: ['a', 'c', 'b', 'd']

As far as performance goes, worrying about it at this stage sounds like premature optimization.


Use an OrderedDict. This will ensure that the order is preserved

>>> ''.join(OrderedDict.fromkeys( "aaabcabccd").keys())
'abcd'

PS: I just timed both the OrderedDict and Set solution, and the later is faster. If order does not matter, set should be the natural solution, if Order Matter;s this is how you should do.

>>> from timeit import Timer
>>> t1 = Timer(stmt=stmt1, setup="from __main__ import data, OrderedDict")
>>> t2 = Timer(stmt=stmt2, setup="from __main__ import data")
>>> t1.timeit(number=1000)
1.2893918431815337
>>> t2.timeit(number=1000)
0.0632140599081196

For completeness sake, here's another recipe that sorts the letters as a byproduct of the way it works:

>>> from itertools import groupby
>>> ''.join(k for k, g in groupby(sorted("aaabcabccd")))
'abcd'

char_seen = []
for char in string:
    if char not in char_seen:
        char_seen.append(char)
print(''.join(char_seen))

This will preserve the order in which alphabets are coming,

output will be

abcd

if the result does not need to be order-preserving, then you can simply use a set

>>> ''.join(set( "aaabcabccd"))
'acbd'
>>>