Download Kite Free! Install Kite Free!

Guide: Type Hinting in Python 3.5

Giovanni Lanzani
August 14, 2019

Table of Contents

Since version 3.5, Python supports type hints: code annotations that, through additional tooling, can check if you’re using your code correctly.

Introduction

With the release of version 3.5, Python has introduced type hints: code annotations that, through additional tooling, can check if you’re using your code correctly.

Long-time Python users might cringe at the thought of new code needing type hinting to work properly, but we need not worry: Guido himself wrote in PEP 484, “no type checking happens at runtime.

The feature has been proposed mainly to open up Python code for easier static analysis and refactoring.

For data science–and for the data scientist– type hinting is invaluable for a couple of reasons:

  • It makes it much easier to understand the code, just by looking at the signature, i.e. the first line(s) of the function definition;
  • It creates a documentation layer that can be checked with a type checker, i.e. if you change the implementation, but forget to change the types, the type checker will (hopefully) yell at you.

Of course, as is always the case with documentation and testing, it’s an investment: it costs you more time at the beginning, but saves you (and your co-worker) a lot in the long run.

Note: Type hinting has also been ported to Python 2.7 (a.k.a Legacy Python). The functionality, however, requires comments to work. Furthermore, no one should be using Legacy Python in 2019: it’s less beautiful and only has a couple more months of updates before it stops receiving support of any kind.

Getting started with types

The code for this article may be found at Kite’s Github repository.

The hello world of type hinting is

# hello_world.py
def hello_world(name: str = 'Joe') -> str:
print(f'Hello {name}')

We have added two type hint elements here. The first one is : str after name and the second one is -> str towards the end of the signature.

The syntax works as you would expect: we’re marking name to be of type str and we’re specifying that the hello_world function should output a str. If we use our function, it does what it says:

> hello_world(name='Mark')
'Hello Mark'

Since Python remains a dynamically unchecked language, we can still shoot ourselves in the foot:

> hello_world(name=2)
'Hello 2'

What’s happening? Well, as I wrote in the introduction, no type checking happens at runtime

So as long as the code doesn’t raise an exception, things will continue to work fine.

What should you do with these type definitions then? Well, you need a type checker, or an IDE that reads and checks the types in your code (PyCharm, for example).

Type checking your program

There are at least four major type checker implementations: Mypy, Pyright, pyre, and pytype:

  • Mypy is actively developed by, among others, Guido van Rossum, Python’s creator. 
  • Pyright has been developed by Microsoft and integrates very well with their excellent Visual Studio Code;
  • Pyre has been developed by Facebook with the goal to be fast (even though mypy recently got much faster);
  • Pytype has been developed by Google and, besides checking the types as the others do, it can run type checks (and add annotations) on unannotated code.

Since we want to focus on how to use typing from a Python perspective, we’ll use Mypy in this tutorial. We can install it using pip (or your package manager of choice):

$ pip install mypy
$ mypy hello_world.py 

Right now our life is easy: there isn’t much that can go wrong in our hello_world function. We’ll see later how this might not be the case anymore.

More advanced types

In principle, all Python classes are valid types, meaning you can use str, int, float, etc. Using dictionary, tuples, and similar is also possible, but you need to import them from the typing module.

# tree.py
from typing import Tuple, Iterable, Dict, List, DefaultDict
from collections import defaultdict

def create_tree(tuples: Iterable[Tuple[int, int]]) -> DefaultDict[int, List[int]]:
"""
Return a tree given tuples of (child, father)

The tree structure is as follows:

tree = {node_1: [node_2, node_3],
node_2: [node_4, node_5, node_6],
node_6: [node_7, node_8]}
"""
tree = defaultdict(list)
for child, father in tuples:
if father:
tree[father].append(child)
return tree

print(create_tree([(2.0,1.0), (3.0,1.0), (4.0,3.0), (1.0,6.0)]))
# will print
# defaultdict(, {1.0: [2.0, 3.0], 3.0: [4.0], 6.0: [1.0]}

While the code is simple, it introduces a couple of extra elements:

  • First of all, the Iterable type for the tuples variable. This type indicates that the object should conform to the collections.abc.Iterable specification (i.e. implement __iter__). This is needed because we iterate over tuples in the for loop;
  • We specify the types inside our container objects: the Iterable contains Tuple, the Tuples are composed of pairs of int, and so on.

Ok, let’s try to type check it!

$ mypy tree.py
tree.py:14: error: Need type annotation for 'tree'

Uh-oh, what’s happening? Basically Mypy is complaining about this line:

tree = defaultdict(list)

While we know that the return type should be DefaultDict[int, List[int]], Mypy cannot infer that tree is indeed of that type. We need to help it out by specifying tree’s type. Doing so can be done similarly to how we do it in the signature:

tree: DefaultDict[int, List[int]] = defaultdict(list)

If we now re-run Mypy again, all is well:

$ mypy tree.py
$

Type aliases

Sometimes our code reuses the same composite types over and over again. In the above example, Tuple[int, int] might be such a case. To make our intent clearer (and shorten our code), we can use type aliases. Type aliases are very easy to use: we just assign a type to a variable, and use that variable as the new type:

Relation = Tuple[int, int]

def create_tree(tuples: Iterable[Relation]) -> DefaultDict[int, List[int]]:
"""
Return a tree given tuples of (child, father)

The tree structure is as follow:

tree = {node_1: [node_2, node_3],
node_2: [node_4, node_5, node_6],
node_6: [node_7, node_8]}
"""
# convert to dict
tree: DefaultDict[int, List[int]] = defaultdict(list)
for child, father in tuples:
if father:
tree[father].append(child)

return tree

Generics

Experienced programmers of statically typed languages might have noticed that defining a Relation as a tuple of integers is a bit restricting. Can’t create_tree work with a float, or a string, or the ad-hoc class that we just created?

In principle, there’s nothing that prevents us from using it like that:

# tree.py
from typing import Tuple, Iterable, Dict, List, DefaultDict
from collections import defaultdict

Relation = Tuple[int, int]

def create_tree(tuples: Iterable[Relation]) -> DefaultDict[int, List[int]]:
...

print(create_tree([(2.0,1.0), (3.0,1.0), (4.0,3.0), (1.0,6.0)]))
# will print
# defaultdict(, {1.0: [2.0, 3.0], 3.0: [4.0], 6.0: [1.0]})

However if we ask Mypy’s opinion of the code, we’ll get an error:

$ mypy tree.py
tree.py:24: error: List item 0 has incompatible type 'Tuple[float, float]'; expected 'Tuple[int, int]'
...

There is a way in Python to fix this. It’s called TypeVar, and it works by creating a generic type that doesn’t require assumptions: it just fixes it throughout our module. Usage is pretty simple:

from typing import TypeVar

T = TypeVar('T')

Relation = Tuple[T, T]
def create_tree(tuples: Iterable[Relation]) -> DefaultDict[T, List[T]]:
...
tree: DefaultDict[T, List[T]] = defaultdict(list)
...

print(create_tree([(2.0,1.0), (3.0,1.0), (4.0,3.0), (1.0,6.0)]))

Now, Mypy will no longer complain, and programmers will be happy as the type hints for create_tree correctly reflect that create_tree works for more than just integers.

Note that it’s important that the ‘T’ inside TypeVar is equal to the variable name T.

Generic classes: Should I have used a TypeVar?

What I said about create_tree at the beginning of this section is not 100% accurate. Since T will be used as a key to a dictionary, it needs to be hashable.

This is important as the key lookup in Python works by computing the hash of the key. If the key is not hashable, the lookup will break.

Such properties are encountered enough that Python offers a few types which can indicate that an object should have certain properties (e.g. it should be hashable if we want it to be a key to a dictionary). 

Some examples:

  • typing.Iterable will indicate that we expect the object to be an iterable;
  • typing.Iterator will indicate that we expect the object to be an iterator;
  • typing.Reversible will indicate that we expect the object to be reversible;
  • typing.Hashable will indicate that we expect the object to implement __hash__;
  • typing.Sized will indicate that we expect the object to implement __len__;
  • typing.Sequence will indicate that we expect the object to be Sized, Iterable, Reversible, and implement count, index.

These are important, because sometimes we expect to use those methods on our object, but don’t care which particular class they belong to as long as they have the methods needed. For example, if we’d like to create our own version of chain to chain sequences together, we could do the following:

from typing import Iterable, TypeVar

T = TypeVar('T')

def return_values() -> Iterable[float]:
yield 4.0
yield 5.0
yield 6.0

def chain(*args: Iterable[T]) -> Iterable[T]:
for arg in args:
yield from arg

print(list(chain([1, 2, 3], return_values(), 'string')))
[1, 2, 3, 4.0, 5.0, 6.0, 's', 't', 'r', 'i', 'n', 'g']

The return_values function is a bit contrived but it illustrates the point: the function chain doesn’t care about who we are as long as we’re iterable!

Any, Union, and Optional

Python provides another couple of features that are handy when writing code with type hints:

  • Any does what you think it does, marking the object to not have any specific type
  • Union can be used as Union[A, B] to indicate that the object can have type A or B
  • Optional is used as Optional[A] to indicate that the object is either of type A or  None. Contrary to real functional languages, we can’t expect safety when sending Optionals around, so beware. It effectively works as a Union[A, None]. Functional programming lovers will recognize their beloved Option (if you come from Scala) or Maybe (if you come from Haskell).

Callables

Python supports passing functions as arguments to other functions, but how should we annotate them? 

The solution is to use Callable[[arg1, arg2], return_type]. If there are many arguments, we can cut them short by using an ellipsis Callable[..., return_type]

As an example, let’s assume we want to write our own map/reduce function (different from Hadoop’s MapReduce!). We could do it with type annotations like this:

# mr.py
from functools import reduce
from typing import Callable, Iterable, TypeVar, Union, Optional

T = TypeVar('T')
S = TypeVar('S')
Number = Union[int, float]

def map_reduce(
it: Iterable[T],
mapper: Callable[[T], S],
reducer: Callable[[S, S], S],
filterer: Optional[Callable[[S], bool]]
) -> S:
mapped = map(mapper, it)
filtered = filter(filterer, mapped)
reduced = reduce(reducer, filtered)
return reduced


def mapper(x: Number) -> Number:
return x ** 2


def filterer(x: Number) -> bool:
return x % 2 == 0


def reducer(x: Number, y: Number) -> Number:
return x + y


results = map_reduce(
range(10),
mapper=mapper,
reducer=reducer,
filterer=filterer
)
print(results)

Just by looking at the signature of map_reduce we can understand how data flows through the function: the mapper gets a T and outputs an S, the filterer, if not None, filters the Ss, and the reducers combines the Ss in the ultimate S

Combined with proper naming, type hints can clarify what the function does without looking at the implementation.

External modules

Annotating our code is nice, but what about all the other modules we might be using? Data scientists import often from, say, NumPy or pandas. Can we annotate functions accepting NumPy arrays as input?

Well, there’s only one way to find out:

# rescale.py
import numpy as np

def rescale_from_to(array1d: np.ndarray,
from_: float=0.0, to: float=5.0) -> np.ndarray:
min_ = np.min(array1d)
max_ = np.max(array1d)
rescaled = (array1d - min_) * (to - from_) / (max_ - min_) + from_
return rescaled

my_array: np.array = np.array([1, 2, 3, 4])

rescaled_array = rescale_from_to(my_array)

We can now type check it:

❯ mypy rescale.py
rescale.py:1: error: No library stub file for module 'numpy'
rescale.py:1: note: (Stub files are from https://github.com/python/typeshed)

It’s failing on line 1 already! What’s happening here is that numpy does not have type annotations, so it’s impossible for Mypy to know how to do the checking (note from the error message that the whole standard library has type annotations through the typeshed project.)

There are a couple of ways to fix this:

  • Use mypy --ignore-missing-import rescale.py on the command line. This has the drawback that it will ignore mistakes as well (misspelling the package name, for example)
  • Append # type: ignore after the module name
    import numpy as np  # type: ignore
  • We can create a .mypy.ini file in our home folder (or a mypy.ini in the folder where our project is) with the following content
# mypy.ini
[mypy]
[mypy-numpy]
ignore_missing_imports = True

I am personally a big fan of the third option, because once a module adds type supports, we can remove it from a single file and be done with it. On the other hand, if we use mypy.ini in the folder where the project is, we can put that into version control and have every coworker share the same configuration.

Conclusion

We learned how to create functions and modules with type hints, and the various possibilities of complex types, generics, and TypeVar. Furthermore, we looked at how a type checker such as Mypy can help us catch early mistakes in our code.

Type hints are — and probably will remain — an optional feature in Python. We don’t have to cover our whole code with type hinting to start, and this is one of the main selling points of using types in Python.

Instead, we can start by annotating functions and variables here and there, and gradually start enjoying code that has all the advantages of type hinting.

As you use type hinting more and more, you will experience how they can help create code that’s easier for others to interpret, catch bugs early on, and maintain a cleaner API.

If you want to know more about type hints, the Mypy documentation has an excellent type system reference.

The code for this article may be found at Kite’s Github repository.