Python List Comprehensions
Table of Contents
- Introduction
- When to use ListComps
- Getting Practice
- Advanced Usage
- Performance Advantages
- Where to Go from Here
You can access the code from this post in Kite’s github repository.
Introduction
Lists are easy to recognize in Python. Whenever we see brackets ‘[]’, we know that lists are afoot. Declaring lists is just about as easy as gets in Python.
Here’s a walkthrough.
The first step:
my_list = []
Then, if we want to add anything to the list, we just call:
my_list.append() # one element
or
my_list.extend() # several elements
Nothing could be easier to read, and if your lists aren’t broken, why fix them? There are actually two major reasons, and even a bonus one:
The two major reasons are:
- You can produce working code quicker.
- Code with List Comprehensions is even easier to read (with a bit of practice).
The bonus reason:
- ListComps have a moderate performance advantage.
(Note: You may see List Comprehensions variously referred to as comprehensions, List Comprehensions, or ListComps.)
The reason I consider the performance advantage not to be a major factor is because if performance is a major concern, you might want to look at a different datatype altogether – such as Dictionary.
For List Comprehensions or Dictionary Comprehensions beginners, it can be hard to see how they could be easier to read than the good ol’ list declaration and manipulation using explicit methods. The answer is practice, and room for error. There’s much less room for error in a ListComp than a nested for loop. Because comprehensions usually (but not necessarily) take place on a single line, the brain can digest more meaning at once.
With practice, you’ll prefer to read code that is written with comprehensions because:
- You automatically know the intended output.
- You can see the input and the way it’s modified.
- Like Lambda functions, you can easily pass ListComps dynamically.
When to use ListComps
What exactly are List Comprehensions, and why would we need them when lists are so flexible out-of-the-box?
List Comprehensions are used when a list should be populated consistently according to a known pattern, often to extract data from an existing collection.
For example, let’s say you have some JSON data from an API, which, when parsed by a library of requests, winds up being a list of thousands of phone numbers with each stored as a dictionary with several fields:
phones = [
{
'number': '111-111-1111',
'label': 'phone',
'extension': '1234',
},
{
'number': '222-222-2222',
'label': 'mobile',
'extension': None,
}
]
What if our assignment was just to print out a list of numbers?
Of course, we can iterate through the list traditionally:
my_phone_list = []
for phone in phones:
my_phone_list.append(phone['number'])
>>>my_phone_list
['111-111-1111', '222-222-2222']
A List Comprehension which accomplishes the same result takes only one line:
>>>[phone['number'] for phone in phones]
['111-111-1111', '222-222-2222']
A List Comprehension follows the basic pattern…
[ <do something to item> for <item> in <list>]
…or if you want to keep the result around:
output_list = [ <manipulate item> for <item> in <input list> ]
Remember, if a List Comprehension appears confusing, as they often do at first, feel free to break it up into a for loop:
<output list>
For <item> in <input list>:
output_list.append(< manipulate item>)
Getting Practice
Because List Comprehensions result in the same output as they do a for loop, the best practice is to think about how you would rewrite a for loop every time you use one. Just remember, whenever you see for
to ask “Now how would this look if it were a List Comprehension?”
Without writing any code, you know it would be shorter, for one thing!
Next, just think where you would put that for
expression:
[ … <for item in list>]
^ Start with brackets, and put your for expression at the end.
Finally, decide what the items should be in your output list and stick them right at the beginning:
[ <output items> for … in …. ]
^ Right at the beginning.
Finally, see if your IDE or interpreter throws any errors, and check your syntax.
Congratulations! You just practiced List Comprehensions. Now repeat, and you’ll be thinking in the language of comprehensions in no time.
Advanced Usage
Nested ListComps
Lists in Python are often nested, so of course we’ll want to be able to produce nested Lists using ListComps.
And guess what? They still fit on one line.
Let’s use an arbitrary example for form, 3 rows of x,y,z.
fields = ['x', 'y', 'z']
rows = [1, 2, 3]
table = []
for r in rows:
row = []
for f in fields:
row.append(f)
table.append(row)
>>>table
[['x', 'y', 'z'], ['x', 'y', 'z'], ['x', 'y', 'z']]
Now see if this is any easier on the eyes:
table = [[f for f in fields] for row in rows]
Still confused? Nested List Comprehensions can be tough at first, but just think about your input and output lists and where they go in the syntax.
What we want here is a list of lists – a list of rows. So, we know our output needs to be a row, which is a list of values.
Since our output is a list, that’s what comes first!
[< give me a list > for row in rows]
^^ The output is a list
We can look at our for loop above to figure that out, or just think of the simplest comprehension in our heads:
[f for f in fields] # you don't *have* to do anything to f
Now since we just want to do that for every element in rows (basically a range), we just say that!
[[f for f in fields] for row in rows]
Or even more simply…
[fields for row in rows]
The first version is more useful if you need to manipulate f
in some way. Try executing functions inside the ListComp:
>>> [[print(f) for f in fields] for row in rows]
x
y
z
x
y
z
x
y
z
[[None, None, None], [None, None, None], [None, None, None]]
First, print()
is executed for the element, then the return value is passed to the list. This is an easy way to perform work on an element, and then see if the function executed successfully for each one.
Note that the list returned is not the list you want, but is composed of the results of evaluating the function.
Suppose instead that the fields were integers which needed to be converted to strings, in which case you could do something like:
>>>fields = [123,456,789]
>>>[[str(f) for f in fields] for row in rows]
[['123', '456', '789'], ['123', '456', '789'], ['123', '456', '789']]
Assuming the rows all had different values, List Comprehensions offer a very concise and readable way to apply that function to all the values.
In the real world, this is an easy way to populate a table for submission to an API which requires multi-dimensional arrays (hint: this is a great way to bulk update a Google Sheet!). Syntactically, it’s much easier to pass a ListComp to a post request than to write out a for loop before the request each time.
Dictionary Comprehensions
We’ve talked about using ListComps to pass formatted information to Dictionaries, but wouldn’t it be nice if you could build a dictionary just like we’ve been building lists?
Good news is, you can: they’re called Dictionary Comprehensions.
There’s two different use cases we need to distinguish between. Building a list of dictionaries is still technically a ListComp since our output is a list of dictionaries, but this is a quick way to map values to a list of dictionaries:
>>> [{str(item):item} for item in [1,2,3,]]
[{'1': 1}, {'2': 2}, {'3': 3}]
A Dictionary Comprehension takes any input and outputs a dictionary, as long as you assign a key and a value in the “do something” area at the beginning.
{v:k for (k, v) in my_dict.items()}
^^ Associate key and value here.
Unlike the list of dictionaries above, we want to have a dictionary as output. So, let’s start with a dictionary that serves a similar purpose as does the string to integer map:
dict_map = {'apple' : 1,
'cherry': 2,
'earwax': 3,}
Maybe we need to reverse the values for our map to work with some function. We could write a for loop and iterate through the dictionary, switching the keys and values. Or, we could use a Dictionary Comprehension to accomplish the same in one line. The brackets let us know that we want the output to be a dictionary:
>>>{v:k for (k, v) in dict_map.items()}
{1: 'apple', 2: 'cherry', 3: 'earwax'}
All we’ve done is reverse the order up front for each tuple returned by .items()
. If you practice reading and writing comprehensions, this one line option is far more readable, and thus Pythonic, than a for loop.
Logic and comparisons in List Comprehensions
One of the most powerful features of List Comprehensions is the ability to conditionally pass values to the list with logical operators. While we need to remember to “do something” up front, at the beginning of the comprehension, we can also “filter” the input inside the comprehension on the same line.
Let’s start with the simplest example. We have a list [1,2,3]
. But we only want values less than 3. To filter out the values we don’t want using a comprehension:
>>>values = [1,2,3]
>>>[i for i in values if i < 3]
[1, 2]
Let’s look at our list of dictionaries from earlier:
dict_map = {
'apple' : 1,
'cherry': 2,
'earwax': 3,
}
Maybe the integer values represent prices for a slice of pie, and we only have two dubloons. We could generate a list with a comprehension and then conditionally weed out the values that we don’t want–or we could use logic inside the List Comprehension:
>>>[k for k, v in dict_map.items() if v < 3]
['apple', 'cherry']
There are more possibilities for this pattern than we can cover here, but with nested, filtered comprehensions, you can output just about any list imaginable. To practice, feel free to start with normal iteration, and then match up the lines of your loop to the comprehension elements.
Performance Advantages
List comprehensions don’t do the same exact thing as a for loop that appends to a list. The results are the same, but under the hood they work slightly differently. To see how, we can look at the Python bytecode produced by both the for loop and the comprehension.
Consider the following example, using a string as our input list:
original_string = 'hello world'
spongecase_letters = []
for index, letter in enumerate(original_string):
if index % 2 == 1:
spongecase_letters.append(letter.upper())
else:
spongecase_letters.append(letter)
spongecase_string = ''.join(spongecase_letters)
First, let’s define a function to make it easier to read:
def spongecase(index, letter):
if index % 2 == 1:
return letter.upper()
else:
return letter
original_string = “hello world”
spongecase_letters = []
for index, letter in enumerate(original_string):
transformed_letter = spongecase(index, letter)
spongecase_letters.append(transformed_letter)
spongecase_string = ‘’.join(spongecase_letters)
# hElLo wOrLd
The syntax of a list comprehension can be represented as:
[transformed_item for item in original_list]
or
[item_you_want for item_you_have in original_list]
So, let’s try:
[spongecase(index, letter) for index, letter in enumerate(original_string)]
Ta-da! You’re left with a list of what you want (spongecase-d letters) given what you have (index and letter) from the original string.
While we can, in practice, think of this syntax as a concise for loop, that’s not exactly true.
Bytecode is instructions sent down to the interpreter that determine what C commands to run to execute the Python code. There are bytecodes for storing constants, like when we store “hello world” to the variable original_string
. There are bytecodes for loading functions, like when we load spongecase
in order to call it. If the for loop implementation and the list comprehension were doing exactly the same thing, they should be produce the same bytecode instructions for the interpreter.
You can view a function’s bytecode with dis, which is part of the standard lib, so if we package our implementations inside functions we can compare the bytecode produced by the two methods. The actual list construction is in fact different between the two implementations: For the for loop, the relevant section is here, where we load the spongecase function, call it to transform the letter, load the append method and then call it to add the transformed letter to the list.
11 46 LOAD_FAST 0 (spongecase) # loads spongecase function
49 LOAD_FAST 3 (index)
52 LOAD_FAST 4 (letter)
55 CALL_FUNCTION 2 # calls spongecase on index and letter
58 STORE_FAST 5 (transformed_letter)
12 61 LOAD_FAST 2 (spongecase_letters) # loads the spongecase_letters list
64 LOAD_ATTR 1 (append) # loads the append method
67 LOAD_FAST 5 (transformed_letter)
70 CALL_FUNCTION 1 # calls the append method to append transformed_letter to spongecase_letters
For the list comprehension, the corresponding section looks different; there is a special bytecode called LIST_APPEND
that performs the same operation as the append method, but has its own bytecode:
40 LOAD_FAST 0 (spongecase) # loads the spongecase function
43 LOAD_FAST 2 (index)
46 LOAD_FAST 3 (letter)
49 CALL_FUNCTION 2 # calls the spongecase function on index and letter
52 LIST_APPEND 2 # appends the result of the spongecase call to an unnamed list
55 JUMP_ABSOLUTE 28
58 STORE_FAST 4 (spongecase_letters) # stores the resulting list as spongecase_letters
The important thing here is that the list comprehension syntax actually produces different instructions for the interpreter than it does for the for loop. These instructions are optimized for building lists – loading the append method is a step that takes nonzero time, for example, so replacing the append load and function call with an instruction specifically for appends to lists saves the interpreter one step per append, or one step per element of the original list. Or, to put it another way, list comprehensions are not just syntactic sugar – they are actually more efficient at building lists than for loops because they compile down to a slightly different bytecode pattern that is optimized specifically for list construction.
Where to Go from Here
Now that we’ve covered the elements and syntax of List Comprehensions, it’s time to go out and get some practice. Remember to keep the elements straight: “Do something” up front to the item in the middle, provided by the list at the end. Optionally, filter the input list using logical operators. Because comprehensions are readable and concise, they may safely be considered Pythonic, even if they are a mystery at first.
You can never be “too good” at List Comprehensions. Often, incredibly complex iterative loops can be replaced with one or two ListComps. This is especially true when writing callbacks for a web framework like Flask, or dealing with an API that returns deeply nested JSON. You may need to create a simple list or dictionary out of a veritable forest of branched responses, and List Comprehensions are the way to go.
This technique is especially useful when dynamically generating lists on changing data, where the logic needs to be decoupled from the input. On the other hand, if you can easily understand and read comprehensions, try to push the envelope with some multi-line comprehensions that conditionally produce complex output from complex input. Check out the Python documentation section on Data Structures to explore the various possibilities of comprehensions, and then experiment with those ideas in your own projects.
This post is a part of Kite’s new series on Python. You can check out the code from this and other posts on our GitHub repository.