Runtime Type Verification in Python

In this post I advocate for a particular style of Python coding which I call “Runtime Type Verification” (RTV), in order to help you write code that has clearer intent, fewer implicit assumptions, and—hopefully—fewer bugs.

Just to be clear: Python doesn’t need type checking like a statically-typed language. If you are coming to Python from another language with static typing, please don’t try to force those idioms on Python. However I think it’s useful to deal with types explicitly when they matter which, as we will see, is a lot of the time.

The Problem

In a nutshell: most (or all) of the methods you write have implicit assumptions about the parameters they accept.

For example, function/method parameters (I’ll use the term “function” to mean both functions and class methods) by default will happily accept NoneType objects (as would be expected). However in a lot of cases—probably the majority—functions aren’t designed to deal with None, resulting in the familiar “TypeError: ‘NoneType’ object has no attribute [foo]” exceptions. This is sort of Python’s version of a null reference exception.

Typically people ignore the possibility of None with the rationale that the code will break somewhere anyway and some exception will be thrown somewhere. However we want to fail as early as possible, and RTV helps to make sure that parameter assumptions are enforced.

Another example might a function that expects a dictionary with specific set of keys, or a list of length between X and Y. The possibilities go on. It’s quite unusual to write a function that has zero knowledge of the arguments passed to it.

You might have a function or method like the following:

Example (rtc_func_ex1.py) download
1
2
3
def NewUser(name, categories, attributes):
    user_obj = User(name, categories, attributes)
    return save_to_database(user_obj)

What implicit assumptions does this function make?

  • ‘name’ exists (is not None) and is a string (or string-like)
  • ‘categories’ exists (is not None) and is an iterable like list or tuple. (You might extend the assumption to say that the iterable contains objects of type str or even valid categories that exist)
  • ‘attributes’ is also a container type of some sort (in this case we will say that the function expects it to be a dict) but may be empty or None.

A Solution

Let’s encode all these assumptions in the preamble to the function (and add a docstring while we’re at it):

Typed Example (rtc_typing_example.py) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def NewUser(name, categories, attributes=None):
    '''
    Create a new user.

    :param name: username
    :param categories: iterable with categories the user belongs to
    :param attributes: optional dictionary of attributes
    :return: boolean indicating database write success
    '''
    assert name and categories
    assert isinstance(name, str)
    assert hasattr(categories, '__iter__')
    assert not attributes or (hasattr(attributes, '__getitem__') and hasattr(attributes, '__setitem__'))

    user_obj = User(name, categories, attributes)
    return save_to_database(user_obj)

Notice the following:

  • We assert that the mandatory arguments exist (this will catch any arguments that are None). The first assert guarantees that both arguments are not None and that empty strings/iterables will be caught.
  • We assert that they have the interface/methods that we expect (more on that below).
  • We allow an optional argument which can be None or a dictionary-like object but nothing else.

Notice in the above example, I did not do either of the following:

1
2
assert isinstance(categories, list)    #BAD
assert isinstance(attributes, dict)    #BAD

Why not?

Assert Behavior (or Interface), Not Identity

One of the many beautiful things about Python is that we don’t usually need to care what exact class a given object is, as long as it exposes the methods/behavior (aka interface) that we need. If I wrote the bad example above and a subsequent user of the function passed in a duck-typed dict-like object, the function would fail. That would suck and is unnecessarily restrictive.

Instead, assert the presence of the methods we require. For most uses, the minimum interface of a dictionary-like object is the ‘__getitem__’ and ‘__setitem__’ methods, so we’ll make sure they exist and nothing else. Similarly, the minimum interface of an iterable is the ‘__iter__’ method. We assert the existence of both of those above.

You could create helper functions to make the code a bit more concise:

Typed Example 2 (rtc_typing_example2.py) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def is_dict_like(obj):
    return hasattr(obj, '__getitem__') and hasattr(obj, '__setitem__')

def is_iterable(obj):
    return hasattr(obj, '__iter__')

def is_optional_dict(obj):
    return not obj or is_dict_like(obj)

def NewUser(name, categories, attributes=None):
    '''
    Create a new user.

    :param name: username
    :param categories: iterable with categories the user belongs to
    :param attributes: optional dictionary of attributes
    :return: boolean indicating database write success
    '''
    assert name and categories
    assert isinstance(name, str)
    assert is_iterable(categories)
    assert is_optional_dict(attributes)

    user_obj = User(name, categories, attributes)
    return save_to_database(user_obj)

Arguably if you want to be more correct you could use the abstract base classes defined in the collections module:

1
2
3
4
import collections
#[...]
assert isinstance(categories, collections.Sequence)
assert isinstance(attributes, collections.Mapping)

You’ll notice that in the earlier example I am explicitly testing that “name” is of class str, contradicting the rule. For the base types str, int , float, etc., I don’t see a problem with testing the class directly. There could be instances where this would be wrong (if you’re doing something funky with integer methods for example). YMMV.

Redundant Verification

Some might argue that if you follow the RTV pattern religiously you will have a lot of redundant verification going on. If Foo() calls Bar() which calls Baz(), passing certain common parameters down, why bother to check them in all three functions?

The reason is that you want to the failure to be caught as early in the call stack as possible after the data error occurs. Bad data will always cause a failure somewhere even with no verification at all. The whole point of RTV is to surface the cause more easily by failing fast.

The other reason is that maybe Foo() and Bar() will be decoupled at some point in the future. You want to make sure those parameters are always verified for all users of the functions.

Taking It Further

Since we are using asserts to verify function parameter types, why not also use them inside function bodies (or at the end, before returning values)?

Let’s modify our example function slightly:

Typed Example 3 (rtc_typing_example3.py) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def is_dict_like(obj):
    return hasattr(obj, '__getitem__') and hasattr(obj, '__setitem__')

def is_iterable(obj):
    return hasattr(obj, '__iter__')

def is_optional_dict(obj):
    return not obj or is_dict_like(obj)

def NewUser(name, categories, attributes=None):
    '''
    Create a new user.

    :param name: username
    :param categories: iterable with categories the user belongs to
    :param attributes: optional dictionary of attributes
    :return: dictionary containing the user object fields if successful or None
    '''
    assert name and categories
    assert isinstance(name, str)
    assert is_iterable(categories)
    assert is_optional_dict(attributes)

    user_obj = User(name, categories, attributes)
    result = save_to_database(user_obj)
    assert not result or (is_dict_like(result) and 'name' in result and 'categories' in result and 'attributes' in result)
    return result

Notice we’ve changed the return type of save_to_database() to now return a dictionary of user object values if successful or None if there was a failure. Rather than return the value without interrogation, we assert that the return value fits the structure we are expecting.

Note that I’m not saying to follow this exact pattern in every circumstance, there are certainly places where it would be redundant and verbose:

1
2
3
#stupid and unnecessary
list_of_stuff = list("foo", "bar", "baz")
assert "foo" in list_of_stuff and "bar" in list_of_stuff and "baz" in list_of_stuff

I do think it is worth verifying the results of certain functions/methods if the results are structured, at least moderately complex and failure is a possibility. Especially third party ones where the return type might change unexpectedly.

Other Solutions

Some Pythonistas might point out that optional type checking already exists in Python 3 in the form of function annotations. This allows you to specify function parameter types in function and method definitions. With them you could use a module like typeannotations which raises a TypeError exception in the event of a type mismatch.

There’s also MyPy but it’s not really Python per se, it’s “an experimental Python variant” which supports optional static typing.

I have no problem with these solutions, but I like RTV better.

  • Explicit asserts are more flexible. We don’t just care about class type, we also care about things like “is integer in valid range”, “is string of length N”, “is iterable > N items”, etc. All these assumptions should be asserted.
  • See Assert Behavior section above. Most of the time we don’t want to lock parameters to just one explicit class.
  • No need for third party modules.
  • Works in Python 2.x
  • Explicit asserts double as documentation and make code intent more clear. They are right there underneath the docstring and not off in some decorator definition somewhere.

Too Slow?

I don’t think this argument holds much water. If asserts are too slow you are using the wrong language for your project. That said, you can turn asserts into no-ops by passing the -O flag to the Python interpreter. I think this defeats the purpose of writing the type verification in the first place, but it’s an option.

Comments