List of all unique characters in a string?
I want to append characters to a string, but want to make sure all the letters in the final list are unique.
Example: "aaabcabccd"
→ "abcd"
Now of course I have two solutions in my mind. One is using a list
that will map the characters with their ASCII codes. So whenever I encounter a letter it will set the index to True
. Afterwards I will scan the list and append all the ones that were set. It will have a time complexity of O(n).
Another solution would be using a dict
and following the same procedure. After mapping every char, I will do the operation for each key in the dictionary. This will have a linear running time as well.
Since I am a Python newbie, I was wondering which would be more space efficient. Which one could be implemented more efficiently?
PS: Order is not important while creating the list.
The simplest solution is probably:
In [10]: ''.join(set('aaabcabccd'))
Out[10]: 'acbd'
Note that this doesn't guarantee the order in which the letters appear in the output, even though the example might suggest otherwise.
You refer to the output as a "list". If a list is what you really want, replace ''.join
with list
:
In [1]: list(set('aaabcabccd'))
Out[1]: ['a', 'c', 'b', 'd']
As far as performance goes, worrying about it at this stage sounds like premature optimization.
Use an OrderedDict. This will ensure that the order is preserved
>>> ''.join(OrderedDict.fromkeys( "aaabcabccd").keys())
'abcd'
PS: I just timed both the OrderedDict and Set solution, and the later is faster. If order does not matter, set should be the natural solution, if Order Matter;s this is how you should do.
>>> from timeit import Timer
>>> t1 = Timer(stmt=stmt1, setup="from __main__ import data, OrderedDict")
>>> t2 = Timer(stmt=stmt2, setup="from __main__ import data")
>>> t1.timeit(number=1000)
1.2893918431815337
>>> t2.timeit(number=1000)
0.0632140599081196
For completeness sake, here's another recipe that sorts the letters as a byproduct of the way it works:
>>> from itertools import groupby
>>> ''.join(k for k, g in groupby(sorted("aaabcabccd")))
'abcd'
char_seen = []
for char in string:
if char not in char_seen:
char_seen.append(char)
print(''.join(char_seen))
This will preserve the order in which alphabets are coming,
output will be
abcd
if the result does not need to be order-preserving, then you can simply use a set
>>> ''.join(set( "aaabcabccd"))
'acbd'
>>>