Python provides a variety of sequences; understanding these builtin sequences saves us from
reinventing the wheel. We can also aspire to create APIs that support existing and user created
sequence types. Python offers two types of sequences:
Container sequences: These hold items of different types, for example list, tuple, etc.
Flat sequences: These hold items of the same type, for example str, bytes, etc.
A container sequence holds references to objects where as a flat sequence contains the value of the
contents in it’s own memory. To understand the above point clearly, we need to understand how python
stores objects.
Every python object has a header with metadata. For example, a float has:
ob_refcnt: Object’s reference count.
ob_type: Object’s type.
ob_fval: Object’s value.
Each of these fields take 8 bytes in a 64-bit built. So an array.array of floats actually holds the
raw values of the floats whereas a list of floats has several objects - the list iteself and reference
to each float object. Another way the sequences can be classified is based on it’s mutability:
Mutable sequences: list, array.array, collections.deque, etc.
Immutable sequences: tuple, str, bytes, etc.
List Comprehensions
List comprehensions or listcomps are used to create a new list from a list by transforming or/and
filtering the data in the original list.
Listcomps can be created using mulltiple loops over multiple iterators as well, the key idea is to
make sure that the script is readable and the transformation is obvious to the end user. It’s very
easy to abuse listcomps to write incomprehensible code. We should stick to plain old for loops in
such cases. Listcomps are also faster than map and filter.
To initialize sequences of types other than list, like arrays, tuples, etc., we can use genexp as
they save memory by yielding one item at a time using the iterator protocol. A listcomp creates a
whole list to feed into another constructor.
Beyond using tuples as “immutable lists”, they can also be thought of as a system of records. Tuples
along with “unpacking” make for very readable way to access elements from a sequence by avoiding indexing.
Here is the comment from Objects/listobject.c that explains what lists are doing:
* This over-allocates proportional to the list size, making room
* for additional growth. The over-allocation is mild, but is
* enough to give linear-time amortized behavior over a long
* sequence of appends() in the presence of a poorly-performing
* system realloc().
* The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
* Note: new_allocated won't overflow because the largest possible value
* is PY_SSIZE_T_MAX * (9 / 8) + 6 which always fits in a size_t.
Coming back to the immutability aspect of Tuples, the fact that tuples are container style sequences,
they can hold mutable objects.
1
2
3
a=(10,"alpha",[1,2])b=(10,"alpha",[1,2])a==b
True
1
2
b[-1].append(99)a==b
False
1
b
(10, 'alpha', [1, 2, 99])
Examples like this can be a source of bugs. It’s always a good idea to avoid mutable objects in a tuple.
A tuple without mutable objects can be used as a key for a dict. This can be verified using the hash method.
Unpacking helps avoid indexing to extract items from sequences. It works with any iterable object,
even if the object doesn’t support index notation [].
1
2
country,capital=("India","New Delhi")country
'India'
1
capital
'New Delhi'
An elegant way to use unpacking is to swap values without using a temporary variable.
item | cost qnt total
----------------------------
apple | 3.2 10 32.0
milk | 3 20 60
Slicing
Slices in python excldue the last item. This convention is common across many languages. This feature
has many convenient side effects:
Length of range(3) and my_list[:3] is the same.
It’s easy to compute the length of a slice when start and stop are given using stop - start.
It’s easy to split a sequence at an index x using my_list[:x] and my_list[x:]
1
2
my_list=list(range(10))my_list[:3],my_list[3:]
([0, 1, 2], [3, 4, 5, 6, 7, 8, 9])
One uncommon use of slices is to create constants to make the code more readable. By assigning names
to slices we can be more explicit instead of needing the user to figure out what the indexes mean.
1
2
3
4
5
6
7
8
9
10
11
12
invoice="""
1 apple $3.2 10
2 banana $1.5. 5
"""SKU_ID=slice(0,2)SKU_NAME=slice(3,9)PRICE=slice(10,14)QUANTITY=slice(15,17)forsku_lineininvoice.split("\n")[1:-1]:print(sku_line[SKU_ID],sku_line[SKU_NAME],sku_line[PRICE],sku_line[QUANTITY])
1 apple $3.2 10
2 banana $1.5 5
The [] operator can also take multpile indexes and slices separated by commas. This pattern is common
in external libraries like numpy and pandas where this notation is used to indicate index/slice across
rows and columns.
1
2
3
4
importnumpyasnparr=np.array([[1,2,3],[4,5,6]])arr
array([[1, 2, 3],
[4, 5, 6]])
1
arr[:,1:]# all rows, columns 2 and 3
array([[2, 3],
[5, 6]])
Slices can also be used on the left-hand side of an assignment operation or as the target of a del operation.
1
2
my_list=list(range(10))my_list
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1
2
my_list[3:5]=[30,40]my_list
[0, 1, 2, 30, 40, 5, 6, 7, 8, 9]
1
2
delmy_list[3:5]my_list
[0, 1, 2, 5, 6, 7, 8, 9]
Using * with Sequences
To concatenate multiple copies of the same sequence, we multiply it by an integer which creates a new sequence.
1
2
3
my_list=[1,2,3]new_list=my_list*3new_list
[1, 2, 3, 1, 2, 3, 1, 2, 3]
1
id(my_list),id(new_list)
(140325040574144, 140325040580864)
Using * to multiply sequences with mutable objects can lead to weird behavior due to the fact that
we end of referring to the same object.
The augmented assignment += works using the special method __iadd__. If __iadd__ is not implemented,
Python falls back to calling __add__.
1
2
3
my_list=[1,2,3]my_list+=[4,5]my_list
[1, 2, 3, 4, 5]
__iadd__ works similar to .extend(). If my_list does not implement __iadd__, a+=b is the same as a=a+b.
Usage
Although it’s highlighted everywhere that lists are container type sequences that can hold objects of
different types, in practice we typically store objects of similar type in a list so that we can apply
some common operation on them(i.e. they should all “quack” whether or not they are 100% ducks).
Tuples being used as records can be used to store objects of different types in practice.
1
2
my_list=[1,2,"30",4,"5"]my_list
[1, 2, '30', 4, '5']
1
sorted(my_list)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-135-cb83e3f42d3f> in <module>
----> 1 sorted(my_list)
TypeError: '<' not supported between instances of 'str' and 'int'
Most of the languages need us to provide a cmp(a, b) method to make comparisons to be able to sort
a sequence. In python, using key argument in list.sort, max, min or sorted methods removes the need
for this cmp() method. Python does compare keys but that is done in optimized C code and not using a
method provided by us.