This post contains references to Big O notation1, if you aren't familiar with it (or don't remember how it works) now is a good time for a refresher.
There are a ton of data structures available in the standard Python library, most of which go underutilized. Today I'd like to focus on one type in particular you should probably be using more than you are: the set
. You may be already aware of the general concept of sets in computer science2, the Python set3 is exactly that.
What is it?
For those of you who don't already know, a set
is an unordered collection of hashable objects which cannot contain any duplicates. The hashable part is key here, much like the keys in a dict
4 (A.K.A. associative array 5), each entry is hashed to create an index into the underlying data structure. You can think of a set
much like a dict
without any values.
When to use it?
A set
is an extremely useful alternative to the more popular list
in several scenarios:
- Looking for membership (e.g. the
in
operator). The lookup time for aset
is O(1) whereas for a list it's O(n)6. - If you'll be removing members, same story as
in
here regarding complexity. - If you need to compare, contrast, or combine collections - the
set
has operators for all of these. - If you cannot have any duplicate values in the collection.
When not to use it?
- If you need to have more than one copy of the same value in the collection.
- If the order of the objects matters
- If the objects in your collection are not hashable7
How to use it?
set_with_values = {'a', 'b', 'c'}
other_set = set() # Note {} would be a dict
assert 'a' in set_with_values # Super speedy operation
other_set.add('c')
assert other_set < set_with_values # This means other_set is a subset of set_with_values
assert set_with_values > other_set # Super set
other_set.add('d')
# Set defines a bunch of useful operators
assert set_with_values | other_set == {'a', 'b', 'c', 'd'}
assert set_with_values & other_set == {'c'}
assert set_with_values - other_set == {'a', 'b'}
assert set_with_values ^ other_set == {'a', 'b', 'd'}
set_with_values |= other_set # You can perform updates with all of those operators
assert set_with_values == {'a', 'b', 'c', 'd'} # Note only 1 copy of 'c'
set_with_values.remove('d') # Again, super fast compared to lists
Conclusion
If you aren't already using sets all over your code, you probably should be. Basically any time you're using in
with a list or del
with a list, you should consider using a different structure.