Download Free Install Free

Guide to Python Dictionaries

Aaron Harris
February 14, 2019

Table of Contents

  1. What is a Python Dictionary?
  2. How to Create and Reference Python Dictionaries
  3. Practical Use Cases
  4. Iterating through Data in Dictionaries
  5. Dictionaries as Nested Data Structures
  6. The Python home for JSON
  7. Pitfalls and Alternatives to Python Dictionaries
  8. Performance Considerations

What is a Python Dictionary?

Second to a Python list, the dictionary or “dict” is a place in memory to store a series of values – also called a collection. The dictionary is special because values are not referenced in order using a numerical index. Rather, in a dictionary, values are referenced with a user-defined key, just as words in a physical dictionary are “keys” associated with the “value” of their meaning. This key is usually a string, but could be any number of data types.

my_dict = {'my_key' : 'my_value'}

For example, instead of referring to the first value in a list with my_list[0], one refers to any dictionary element by its key:

>>> my_dict['my_key']
‘my_value’

These explicit references are more legible than list index notation and improve the maintainability and performance of code in most situations.

Additionally, key-value combinations allow complex hierarchies of nested data. As words in a dictionary are keys to the values of their definitions, so letters of the alphabet are keys to the values of words themselves. Such complexity in data in structure is often necessary, when dealing with complex data. With this special feature, a dictionary lives somewhere between lists and user-defined classes. Python dictionaries are more feature-rich than lists, but don’t require as much effort as a user-defined class with unique attributes and methods.

How to Create and Reference Python Dictionaries

There are several ways to declare a dictionary, depending on the situation. The simplest is to enclose the keys and values in curly braces, like so:

my_dict = {'key1': 1, 'key2': 2}

You can also pass key-value pairs to the dict keyword constructor, though this is less common:

my_dict = dict(key1 = 1, key2 = 2)

Assigning values on declaration is useful when returning a dictionary with dynamic values, or as part of a lambda or comprehension. Both the keys and the values may be references to variables defined elsewhere, allowing dynamic assignment.

Sometimes it’s necessary to declare an empty dictionary, as values might be added later, but other parts of the code need something to reference in the meantime.

To declare an empty dictionary:

my_dict = {}
my_dict = dict()

Values may then be appended to this dictionary when they become available with the assignment operator:

my_dict['key'] = 123

>>> my_dict
{'key': 123}

Python dictionaries are stored and referenced like any other variable. In fact, dictionaries can be stored within dictionaries, and often are. In this case, just refer to the stored dictionary as you would any other value – by its key.

my_dict = {
'my_nested_dict':
{
'a_key': 'a_value',
'another_key': 'another_value',
}
}

It’s polite to use whitespace in a way that clearly indicates nested layers while maintaining consistency with Python best practices. The specific format may be determined by an IDE auto-formatter, or a pre-deployment linter.

Now, we can refer to the nested dictionary by its key:

my_variable = my_dict['my_nested_dict']

The Dictionary Comprehension – Less is More

A more advanced technique for defining a dictionary is using the Python dictionary comprehension. Like a list comprehension, a dictionary comprehension generates a dynamically-sized dictionary in a format more concise than the notation above:

automatic_dictionary = {key: value for (key, value) in < some_iterable >}

Any iterable object that could be associated in terms of keys and values, a list of tuples for example, easily becomes a dictionary with a single line of code. Depending on the size of the iterable, the dictionary comprehension notation can be a space-saver (and a lifesaver!) making code that much more “Pythonic.”

Practical Use Cases

You can check out Kite’s Github repository to easily access the code from this post and others from their Python series.

Let’s say we need to quickly model and store some data without the boiler-plate of a class or hairy SQL statements. For example, we need to store some data about users of a website.

A User class might look like…

class User(object):
""" Stores info about Users """

def __init__(self, name, email, address, password, url):
self.name = name
self.email = email
...

def send_email(self):
""" Send an email to our user"""
pass

def __repr__():
"""Logic to properly format data"""

bill = User('Bill', 'bill @ gmail.com', '123 Acme Dr.', 'secret-password',
'http: // www.bill.com')
bill.send_email()

Such a class could have all kinds of features, and developers could argue over the whether to use the new @dataclass feature, or whether we want class or instance methods, etc., but with a dictionary, there is less overhead:

bill = {'email': 'bill@gmail.com',
'address': '123 Acme Dr.',
'password': 'secret-password',
'url': 'http://www.bill.com'}

def send_email(user_dict):
pass
# smtp email logic …

send_email(bill['email']) # bracket notation or …
send_email(bill.get('email')) # .get() method is handy, too

Now we can have bill’s data as intuitively as we would we a Bill object, along with half the code.

Iterating through Data Stored in Dictionaries

Because JSON responses are often lists of dictionaries (perhaps parsed form an API response to generate a list of User instances,) we can iterate through this to create some User instances.

json_response = [{
'id': 1,
'first_name': 'Florentia',
'last_name': 'Schell'",
'email': 'fschelle0@nyu.edu',
'url': 'https://wired.com'
}, {
'id': 2,
'first_name': 'Montague',
'last_name': 'McAteer',
'email': 'mmcateer1@zdnet.com',
'url': 'https://domainmarket.com'
}, {
'id': 3,
'first_name': 'Dav',
'last_name': 'Yurin',
'email': 'dyurin2@e-recht24.de',
'url': 'http://wufoo.com'
}]

Notice the natural structure of dictionaries as rows of data. We can easily iterate through these rows to create our User objects.

users = []
for i in json_response:
users.append(User(
name=i['first_name'] + i['last_name'],
email = i['email'],
url=i['url'],
# ...
))

Dictionaries as Nested Data Structures

Compared to lists, Python dictionaries may seem at first to be rigid and unforgiving: a veritable soup of colons and brackets. However, compared to data stored in a relational database (where values must comply to specific constraints in order to make relationships possible), dictionaries are extremely flexible.

For one thing, a value in a dictionary can be any python object, and collections of objects are often instantiated with values from a dictionary. Values are related to other values by simply “attaching” them. That is, placing one value in a list or dictionary, with the first value as the key. Although a dictionary created this way may seem complex, it’s actually far simpler to pull specific values out of a dictionary than to write a SQL query.

Because of their structure, Python dictionaries are a good way of understanding other nested data structures (like JSON or XML) – which are often referred to as non-relational, encompassing everything but relational databases like MySQL, PostgreSQL, as well as others.

The advantage of less rigid structures is that specific values are easily accessible. The disadvantage is that sets of values on a corresponding “level” of nesting under other keys are more difficult to relate to each other, and the resulting code is more verbose. If data naturally falls into columns and rows, then something like a Pandas DataFrame or a Numpy ndarray would be more appropriate, allowing values to be referenced by their relative location in vector space.

The Python Home for JSON

While there are some subtle differences between Python dictionaries and JSON (JavaScript Object Notation), the similarities between the two data structures are a major bonus for developers consuming data from other sources. In fact, calling the .json() method on a response from the requests library will return a dictionary.

Recently, JSON has become the de facto medium for data exchange via an API, with markup languages like XML and YAML trailing by a significant margin. This lead is most likely due to the prevalence of JavaScript, and the need for web services to be able to “speak” JavaScript to other web services. According to some, JSON is simply less work to unpack.

Luckily, or perhaps by design, Python lends itself well to consuming JSON via its native data structure: the Python dictionary. That being said, here are some of the differences:

  1. JSON is for Serialization: While Python developers are used to manipulating Python objects in memory, JSON is a different story. Instead, JSON is a standard for serializing all sorts of data to send like a telegram over HTTP. Once JSON makes it across the wire, it can be deserialized, or loaded into a Python object.
  2. JSON can be a String: Before JSON objects make it into Python logic, they are strings usually sent as a response to an HTTP request, and then parsed in various ways. JSON responses usually look like lists of dictionaries surrounded by quotes. Conveniently, lists of dictionaries can be easily parsed into even more useful objects like Pandas DataFrames (Pandas is a powerful data analysis tool for Python). Whenever loading and dumping (serializing) JSON objects, at some point they will become strings in Python.
  3. Duplicate Keys: Python dictionary keys must be unique. In other words, some_dictionary.keys() will be a set of unique values. This is not the case for JSON – which is a bit unusual as it seems to defeat the purpose of keys in the first place – but no one ever said JSON was pythoic. Duplicate keys must be explicitly handled when converting JSON to a Python object, or only one key-value pair will make it through.

Pitfalls and Dictionary-like Alternatives

Dictionaries are incredibly useful, but some aspects of the language specification cause dictionaries to seem to misbehave. For example, when iterating through a dictionary, a developer may reference a key-value pair that hasn’t been defined. Instead of returning “None,” the Python dictionary will throw an error and print out a traceback, halting execution entirely if the error is not handled. This behavior can slow the development cycle.

>>> print(my_dict['my_key'])
Traceback (most recent call last):
  File '<input>', line 1, in <module>
KeyError: 'my_key'

Since a program may often just need to “check” for the existence of a key-value pair without throwing an error, a developer has other options. The first is to import the defaultdict object from the collections module, a handy override automatically populated with default values. Rather than showing an error, the default value is returned.

Secondly, the .get() method on a standard dictionary may return any value passed as the second argument. So, instead of bracket notation, referencing a value looks like …

just_checking = my_dict.get('my_key’, None)
>>> print(just_checking)
None

Much better!

OrderedDict

Dictionaries are defined as “unordered” collections of key-value pairs, which can be inconvenient. To add ordered behavior, we have the OrderedDict, also from the collections module. As the name implies, an OrderedDict maintains returns pairs in the order they are defined.

While not as lightweight as the standard dictionary, many developers prefer to use OrderedDict, as it behaves in a more predictable way. When iterating through a standard dictionary, the key-value pairs will be returned in a random order. An OrderedDict always returns pairs in the same order, which can be helpful when looking for specific pairs in a large dataset. Proponents of defaultdict and OrderedDict don’t ask “Why?” – they ask “Why not?”

Performance Considerations

Are you seeing poor performance in your Python application? Stop iterating through lists, and start referencing values in a dictionary.

Technically, the function of a dictionary could be emulated with the use of lists. Creating key-value pairs with lists is often an introductory programming exercise. However, it’s critical to a high-level language like Python to have a high-performing implementation. One reason is that dictionaries are used internally by the Python language implementation itself.

Another reason is that dictionaries perform exponentially faster than a list. In a Python list, to locate a specific item, each item must be checked until a match is found. With a dictionary, the only item that’s checked is the item (or object, or collection), that is associated with the specific key. This has the effect of dramatically improving performance, often by orders of magnitude.

Where to Go From Here…

The best way to get to know dictionaries is to get some practice! Try iterating through dictionaries, storing the keys and values in separate lists, and then re-assigning them to each other in the proper order.

Try creating interesting series of objects from dictionaries, and dictionaries from objects. If you had to store 1,000 rows of data in a dictionary, what my be a good Python pattern to approach the problem?

Before running to stack exchange, think about the nature of a dictionary. Are the keys unique values, or can they be repeated? If they are unique, what type of Python collection could best store the values? Now, try searching for the canonical solutions. Of course, don’t forget to check out the official Python documentation on dictionaries:

https://docs.python.org/3/tutorial/datastructures.html

The Python dictionary is a fundamental data structure in Python, and is a core component of the Python language specification. When treated with care, dictionaries become high-performance tools for storing and accessing complex data in an explicit, readable, and – most importantly –a pythonic way.