Python Training by Dan Bader

A Python Riddle: The Craziest Dict Expression in the West

Let’s pry apart this slightly unintuitive Python dictionary expression to find out what’s going on in the uncharted depths of the Python interpreter.

Sometimes you strike upon a tiny code example that has real depth to it—a single line of code that can teach you a lot about a programming language if you ponder it enough. Such a code snippet feels like a Zen kōan: a question or statement used in Zen practice to provoke doubt and test the student’s progress.

The tiny little code snippet we’ll discuss in this tutorial is one such example. Upon first glance, it might seem like a straightforward dictionary expression, but when considered at close range, it takes you on a mind-expanding journey through the CPython interpreter.

I get such a kick out of this little one-liner that at one point I had it printed on my Python conference badges as a conversation starter. It also led to some rewarding conversations with members of my Python newsletter.

So without further ado, here is the code snippet. Take a moment to reflect on the following dictionary expression and what it will evaluate to:

>>> {True: 'yes', 1: 'no', 1.0: 'maybe'}

I’ll wait here…

Ok, ready?

This is the result we get when evaluating the above dict expression in a CPython interpreter session:

>>> {True: 'yes', 1: 'no', 1.0: 'maybe'}
{True: 'maybe'}

I’ll admit I was pretty surprised about this result the first time I saw it. But it all makes sense when you investigate what happens, step by step. So, let’s think about why we get this—I want to say slightly unintuitive—result.

Where Baby Dictionaries Come From

When Python processes our dictionary expression, it first constructs a new empty dictionary object; and then it assigns the keys and values to it in the order given in the dict expression.

Therefore, when we break it down, our dict expression is equivalent to this sequence of statements that are executed in order:

>>> xs = dict()
>>> xs[True] = 'yes'
>>> xs[1] = 'no'
>>> xs[1.0] = 'maybe'

Oddly enough, Python considers all dictionary keys used in this example to be equal:

>>> True == 1 == 1.0
True

Okay, but wait a minute here. I’m sure you can intuitively accept that 1.0 == 1, but why would True be considered equal to 1 as well? The first time I saw this dictionary expression it really stumped me.

After doing some digging in the Python documentation, I learned that Python treats bool as a subclass of int. This is the case in Python 2 and Python 3:

“The Boolean type is a subtype of the integer type, and Boolean values behave like the values 0 and 1, respectively, in almost all contexts, the exception being that when converted to a string, the strings ‘False’ or ‘True’ are returned, respectively.” (Source)

And yes, this means you can technically use bools as indexes into a list or tuple in Python:

>>> ['no', 'yes'][True]
'yes'

But you probably should not use boolean variables like that for the sake of clarity (and the sanity of your colleagues.)

Anyway, let’s come back to our dictionary expression.

As far as Python is concerned, True, 1, and 1.0 all represent the same dictionary key. As the interpreter evaluates the dictionary expression, it repeatedly overwrites the value for the key True. This explains why, in the end, the resulting dictionary only contains a single key.

Before we move on, let’s have another look at the original dictionary expression:

>>> {True: 'yes', 1: 'no', 1.0: 'maybe'}
{True: 'maybe'}

Why do we still get True as the key here? Shouldn’t the key also change to 1.0 at the end, due to the repeated assignments?

After some mode research in the CPython interpreter source code, I learned that Python’s dictionaries don’t update the key object itself when a new value is associated with it:

>>> ys = {1.0: 'no'}
>>> ys[True] = 'yes'
>>> ys
{1.0: 'yes'}

Of course this makes sense as a performance optimization—if the keys are considered identical, then why spend time updating the original? In the last example you saw that the initial True object is never replaced as the key. Therefore, the dictionary’s string representation still prints the key as True (instead of 1 or 1.0.)

With what we know now, it looks like the values in the resulting dict are getting overwritten only because they compare as equal. However, it turns out that this effect isn’t caused by the __eq__ equality check alone, either.

Wait, What About the Hash Code?

Python dictionaries are backed by a hash table data structure. When I first saw this surprising dictionary expression, my hunch was that this behavior had something to do with hash collisions.

You see, a hash table internally stores the keys it contains in different “buckets” according to each key’s hash value. The hash value is derived from the key as a numeric value of a fixed length that uniquely identifies the key.

This allows for fast lookups. It’s much quicker to search for a key’s numeric hash value in a lookup table instead of comparing the full key object against all other keys and checking for equality.

However, the way hash values are typically calculated isn’t perfect. And eventually, two or more keys that are actually different will have the same derived hash value, and they will end up in the same lookup table bucket.

If two keys have the same hash value, that’s called a hash collision, and it’s a special case that the hash table’s algorithms for inserting and finding elements need to handle.

Based on that assessment, it’s fairly likely that hashing has something to do with the surprising result we got from our dictionary expression. So let’s find out if the keys’ hash values also play a role here.

I’m defining the following class as our little detective tool:

class AlwaysEquals:
     def __eq__(self, other):
         return True

     def __hash__(self):
         return id(self)

This class is special in two ways.

First, because its __eq__ dunder method always returns True, all instances of this class will pretend they’re equal to any other object:

>>> AlwaysEquals() == AlwaysEquals()
True
>>> AlwaysEquals() == 42
True
>>> AlwaysEquals() == 'waaat?'
True

And second, each AlwaysEquals instance will also return a unique hash value generated by the built-in id() function:

>>> objects = [AlwaysEquals(),
               AlwaysEquals(),
               AlwaysEquals()]
>>> [hash(obj) for obj in objects]
[4574298968, 4574287912, 4574287072]

In CPython, id() returns the address of the object in memory, which is guaranteed to be unique.

With this class we can now create objects that pretend to be equal to any other object but have a unique hash value associated with them. That’ll allow us to test if dictionary keys are overwritten based on their equality comparison result alone.

And, as you can see, the keys in the next example are not getting overwritten, even though they always compare as equal:

>>> {AlwaysEquals(): 'yes', AlwaysEquals(): 'no'}
{ <AlwaysEquals object at 0x110a3c588>: 'yes',
  <AlwaysEquals object at 0x110a3cf98>: 'no' }

We can also flip this idea around and check to see if returning the same hash value is enough to cause keys to get overwritten:

class SameHash:
    def __hash__(self):
        return 1

Instances of this SameHash class will compare as non-equal with each other but they will all share the same hash value of 1:

>>> a = SameHash()
>>> b = SameHash()
>>> a == b
False
>>> hash(a), hash(b)
(1, 1)

Let’s look at how Python’s dictionaries react when we attempt to use instances of the SameHash class as dictionary keys:

>>> {a: 'a', b: 'b'}
{ <SameHash instance at 0x7f7159020cb0>: 'a',
  <SameHash instance at 0x7f7159020cf8>: 'b' }

As this example shows, the “keys get overwritten” effect isn’t caused by hash value collisions alone either.

Umm Okay, What’s the Executive Summary Here?

Python dictionaries check for equality and compare the hash value to determine if two keys are the same. Let’s try and summarize the findings of our investigation:

The {True: 'yes', 1: 'no', 1.0: 'maybe'} dictionary expression evaluates to {True: 'maybe'} because the keys True, 1, and 1.0 all compare as equal, and they all have the same hash value:

>>> True == 1 == 1.0
True
>>> (hash(True), hash(1), hash(1.0))
(1, 1, 1)

Perhaps not-so-surprising anymore, that’s how we ended up with this result as the dictionary’s final state:

>>> {True: 'yes', 1: 'no', 1.0: 'maybe'}
{True: 'maybe'}

We touched on a lot of subjects here, and this particular Python Trick can be be a bit mind-boggling at first—that’s why I compared it to a Zen kōan in the beginning.

If it’s difficult to understand what’s going on in this tutorial, try playing through the code examples one by one in a Python interpreter session. You’ll be rewarded with an expanded knowledge of Python’s internals.

It’s a Python Trick!

» Subscribe to the dbader.org YouTube Channel for more Python tutorials.

There’s one more thing I want to tell you about:

I’ve started a series of these Python “tricks” delivered over email. You can sign up at dbader.org/python-tricks and I’ll send you a new Python trick as a code screenshot every couple of days.

This is still an experiment and a work in progress but I’ve heard some really positive feedback from the developers who’ve tried it out so far.

Thanks to JayR, Murat, and kurashu89 for their feedback on this article.

<strong><em>Improve Your Python</em></strong> with a fresh 🐍 <strong>Python Trick</strong> 💌 every couple of days

Improve Your Python with a fresh 🐍 Python Trick 💌 every couple of days

🔒 No spam ever. Unsubscribe any time.

This article was filed under: programming, and python.

Related Articles:
  • Records, Structs, and Data Transfer Objects in Python – How to implement records, structs, and “plain old data objects” in Python using only built-in data types and classes from the standard library.
  • Python Iterators: A Step-By-Step Introduction – Understanding iterators is a milestone for any serious Pythonista. With this step-by-step tutorial you’ll understanding class-based iterators in Python, completely from scratch.
  • Comprehending Python’s Comprehensions – One of my favorite features in Python are list comprehensions. They can seem a bit arcane at first but when you break them down they are actually a very simple construct.
  • Writing Clean Python With Namedtuples – Python comes with a specialized “namedtuple” container type that doesn’t seem to get the attention it deserves. It’s one of these amazing features in Python that’s hidden in plain sight.
  • Python’s Functions Are First-Class – Python’s functions are first-class objects. You can assign them to variables, store them in data structures, pass them as arguments to other functions, and even return them as values from other functions.
Latest Articles:
← Browse All Articles