Python Pandas TypeError: unhashable type: 'Series'
LearnDataSci is reader-supported. When you purchase through links on our site, earned commissions help support our team of writers, researchers, and designers at no extra cost to you.
Why does this happen?
This error occurs when attempting to use a Pandas Series object in a place where a _hashable_ object is expected.
For example, if you were to try to use a Series as a dictionary key:
The error occurs because dictionary keys must be hashable, which means they must be immutable (unchanging).
The most notable places in Python where you must use a hashable object are dictionary keys, set elements, and Pandas Index values, including DataFrame columns. Since a Series object is not hashable, it won't work for any of these cases.
Below, we'll explore two main situations:
- Intentional use of a Series, where we'll look at ways to convert a Series into something that can pass hashability
- Accidental use of a Series, where we'll explore two cases where unintentional Series objects are commonly produced: slicing DataFrames and using
iterrows
.
Before getting to the possible causes, let's understand hashability. If you're comfortable with hashing already, feel free to skip to the solutions section.
Hashability and why Series aren't hashable
A hash code refers to the integer representation of objects. Objects can be translated into hash codes by passing them to the built-in hash()
function, like so:
The hash function returns a bit representation of the object. For the hash function to work, the object being hashed must be immutable. The hash code of the same object will always be the same—the hash of "One" will always result in the same binary value.
Immutable objects
Dictionaries, sets, lists, and Series are mutable and, therefore, cannot be hashed. Conversely, numeric types, booleans, and strings are immutable, so they can all be hashed. Tuples are also immutable but can only be hashed if their elements and subelements are also immutable.
We can test whether any object is hashable by passing it to hash()
. Let's try it on a Series object:
Since a Series object is mutable, Python can't assign it a unique hash.
For a more detailed description of how hash codes are used in Python, check out Brandon Craig Rhodes' 2010 PyCon speech, The Mighty Dictionary).
Now that we understand hashability, we can discuss the possible causes of the unhashable type: 'Series
error.
Cause 1: Assigning Series Objects to Dictionary Keys, Set Elements, or Pandas Index Values
Dictionary keys, set elements, and Pandas Index values are all required to be of a hashable type. As mentioned, using a Pandas Series in any of these places will cause an error.
Bringing back the intro example, let's try using a Series object as a key in a dictionary:
Dictionaries and sets quickly raise the error, but DataFrames and Series may overlook such mistakes at first. Take a look at the two Series objects below:
In correct_series
, elements of series
got matched to the elements of the index, whereas in the faulty_series
, we assigned series
as a value to index. The latter assignment should be forbidden, yet, there is no error message.
However, when we interact with the index in some way, we see an error:
Attempting to rename the first index returned an error message. The code itself doesn't cause an error until we attempt to use the faulty structure of the index.
Solution
We have to replace our Series object with something hashable. A named tuple is an ideal hashable alternative to the Series since it also uses key-value pairing.
Below, we parse the Series into a named tuple before using them as dictionary keys:
We've essentially frozen our Series into named tuples, allowing them to be hashed and used as dictionary keys.
Now, we can access values in the dictionary using one of the named tuples:
Cause 2: Slicing the DataFrame Wrong
You may be accidentally using a Series where a hashable object is expected. One common scenario is trying to extract a scalar from a DataFrame but ending up with a Series due to incorrect slicing.
To demonstrate, let's make a simple movies DataFrame:
Name | Director | Year | |
---|---|---|---|
0 | War Dogs | Todd Phillips | 2016 |
1 | Money Ball | Bennett Miller | 2011 |
2 | The Irishman | Martin Scorsese | 2019 |
3 | Joker | Todd Phillips | 2019 |
4 | The Wolf of Wall Street | Martin Scorsese | 2013 |
The code below intends to count the times each director's name was mentioned in the movies
and report it in dictionary format.
The error message claims there is a problem with using director_name
as a key to the mentions
dictionary. Even though we meant to extract a string, the program passed a Series object as director_name
.
Solution
Let's squeeze in a print statement before the erroneous line and look at the director_name
.
We are indeed getting the one value but in Series format. This is because of the brackets surrounding our column selection.
Even though the brackets have one label inside (['Director']
), loc
deemed it a list and expected multiple values. It, therefore, created a Series object.
Let's get rid of the brackets and rerun the same code:
We now get the expected output.
Cause 3: Not Unpacking Iterrows
Let's simulate this scenario using the movies
DataFrame again:
Like before, we'll try to count each time a director was mentioned in movies
and report it in dictionary format. This time, we'll use iterrows()
to iterate through the DataFrame.
Here, the error claims we have passed an unhashable value to loc[]
. Since Indexes can only hold hashable values, loc
expects a hashable selector, so row
seems to be a problem.
Solution
Let's squeeze in a print statement before the erroneous line and look at the row
.
We are getting a tuple with two values: an index of 0 and the row itself.
This is because iterrows()
returns a tuple of the format: [Hashable, Series]
for each row it iterates through. While the Hashable
holds the row's index label, the Series
holds the row's data.
The proper use of the iterrows
requires us to unpack it like so:
Since we don't need the value for index
, we're using an underscore (_
) to throw it away. And we now have the same mentions count as before.
Summary
Python enforces hashability on dictionary keys, set elements, and Pandas Index values. Since it is unhashable, a Series object is not a good fit for any of these.
Furthermore, unintended Series objects may be the cause. Slicing DataFrames incorrectly or using iterrows without unpacking the return value can produce Series values when it's not the intended type.