Understanding the set() function
In python, set()
is an unordered collection with no duplicate elements. However, I am not able to understand how it generates the output.
For example, consider the following:
>>> x = [1, 1, 2, 2, 2, 2, 2, 3, 3]
>>> set(x)
set([1, 2, 3])
>>> y = [1, 1, 6, 6, 6, 6, 6, 8, 8]
>>> set(y)
set([8, 1, 6])
>>> z = [1, 1, 6, 6, 6, 6, 6, 7, 7]
>>> set(z)
set([1, 6, 7])
Shouldn't the output of set(y)
be: set([1, 6, 8])
? I tried the above two in Python 2.6.
Solution 1:
Sets are unordered, as you say. Even though one way to implement sets is using a tree, they can also be implemented using a hash table (meaning getting the keys in sorted order may not be that trivial).
If you'd like to sort them, you can simply perform:
sorted(set(y))
which will produce a sorted list containing the set's elements. (Not a set. Again, sets are unordered.)
Otherwise, the only thing guaranteed by set
is that it makes the elements unique (nothing will be there more than once).
Hope this helps!
Solution 2:
As an unordered collection type, set([8, 1, 6])
is equivalent to set([1, 6, 8])
.
While it might be nicer to display the set contents in sorted order, that would make the repr()
call more expensive.
Internally, the set
type is implemented using a hash table: a hash function is used to separate items into a number of buckets to reduce the number of equality operations needed to check if an item is part of the set.
To produce the repr()
output it just outputs the items from each bucket in turn, which is unlikely to be the sorted order.
Solution 3:
As +Volatility and yourself pointed out, sets are unordered. If you need the elements to be in order, just call sorted
on the set:
>>> y = [1, 1, 6, 6, 6, 6, 6, 8, 8]
>>> sorted(set(y))
[1, 6, 8]
Solution 4:
Python's sets (and dictionaries) will iterate and print out in some order, but exactly what that order will be is arbitrary, and not guaranteed to remain the same after additions and removals.
Here's an example of a set changing order after a lot of values are added and then removed:
>>> s = set([1,6,8])
>>> print(s)
{8, 1, 6}
>>> s.update(range(10,100000))
>>> for v in range(10, 100000):
s.remove(v)
>>> print(s)
{1, 6, 8}
This is implementation dependent though, and so you should not rely upon it.
Solution 5:
After reading the other answers, I still had trouble understanding why the set comes out un-ordered.
Mentioned this to my partner and he came up with this metaphor: take marbles. You put them in a tube a tad wider than marble width : you have a list. A set, however, is a bag. Even though you feed the marbles one-by-one into the bag; when you pour them from a bag back into the tube, they will not be in the same order (because they got all mixed up in a bag).