ValueError: The truth value of a series is ambiguous. use a.empty, a.bool(), a.item(), a.any() or a.all()
LearnDataSci is reader-supported. When you purchase through links on our site, earned commissions help support our team of writers, researchers, and designers at no extra cost to you.
This error is usually triggered when creating a copy of a dataframe that matches either a single or multiple conditions. Let's consider the example dataframe below:
If we want to retrieve the cars with prices less than 20,000 you might try the following:
This error occurs because the if
statement requires a truth value, i.e., a statement evaluating to True
or False
. In the above example, the <
operator used against a dataframe will return a boolean series, containing a combination of True
and False
for its values. Here's what the result actually looks like:
Since a series is returned, Python doesn't know which value to use, meaning that the series has an ambiguous truth value.
Instead, we can pass this statement into dataframe brackets to get the desired values:
manufacturer | model | price | mileage | |
---|---|---|---|---|
1 | Kia | Rio | 12500 | 4500 |
We can also match multiple conditions using |
for or
and &
for and
:
manufacturer | model | price | mileage | |
---|---|---|---|---|
0 | BMW | 1 Series | 28000 | 1800 |
3 | Audi | A3 | 26500 | 700 |
Let's go further in depth on different solutions for this error.
Cause 1: Looking for rows that meet a single condition
Let's say we want to get all cars less than 30,000 using the following boolean series:
A boolean series like this is known as a mask. By passing this mask to the same dataframe, we get back only interested in values of the dataframe that have a True
value for the matching index in our boolean series.
manufacturer | model | price | mileage | |
---|---|---|---|---|
0 | BMW | 1 Series | 28000 | 1800 |
1 | Kia | Rio | 12500 | 4500 |
3 | Audi | A3 | 26500 | 700 |
The rows with indexes of 0, 1, and 3 all have a True
value in our mask. Therefore, these are the rows our statement above returns.
Using any() and all()
any()
and all()
are two ways to obtain a single truth value based on a mask.
For example, we can also use the method .any()
to return True
if any of the values in a mask are True
:
Similarly, we can use .all()
, which will return True
only when all of the values in a mask are True
:
Cause 2: Looking for rows that meet multiple conditions
Building on our example from the previous section, let's try and find cars that cost less than 30000 and have mileage under 2000. Using the solution from the first section, we could build upon this:
Notice that we're getting an error again, This time, it's because Python is interpreting the statement as return True
if df['price'] < 30000
and df['mileage'] < 2000
. We know that df['price'] < 30000
and df['mileage'] < 2000
both return a mask, so the truth value is ambiguous here.
To resolve this issue, we need to replace and
with &
:
manufacturer | model | price | mileage | |
---|---|---|---|---|
0 | BMW | 1 Series | 28000 | 1800 |
3 | Audi | A3 | 26500 | 700 |
The &
symbol is a bitwise operator, meaning it compares the two statements bit by bit. Using &
will return a copy of the dataframe containing rows with a True
value in the mask generated by both conditions.
By using the |
operator in place of or
, we can return a copy containing rows that have a True
value in the mask generated by either condition, as shown:
manufacturer | model | price | mileage | |
---|---|---|---|---|
1 | Kia | Rio | 12500 | 4500 |
2 | Mercedes | A-Class | 30000 | 400 |
3 | Audi | A3 | 26500 | 700 |
Furthermore, we can also use the ~
operator, which is the bitwise equivalent of not
:
manufacturer | model | price | mileage | |
---|---|---|---|---|
1 | Kia | Rio | 12500 | 4500 |
2 | Mercedes | A-Class | 30000 | 400 |
The ~
operator essentially reverses what comes after it, which is the compound bitmask in the parentheses.
Summary:
This value error is caused by using a mask (boolean series) in the place of a truth value. A mask has values that are either True
or False
, varying from row to row. As a result, Python can't determine whether a series as a whole is True
or False
- it is ambiguous.
When searching for dataframe rows that only match a single condition, we can avoid the error with masking, using df[]
and placing the statement generating the mask within the brackets, for example, df[df['price'] < 30000]
.
If looking for rows that match multiple conditions, to avoid the error, we must replace statements like and
, or
and not
with their respective bitwise operators, &
, |
and ~
.