Compare Python dataframe columns to variables containing None values
I'm comparing a dataframe to multiple variables as follows...
if not ( (df['Column1'] == variableA) & (df['Column2'] == variableB) & (df['Column3'] == variableC) ).any():
print("row not in dataframe")
Both my dataframe and variables may contain a None value, but I've noticed that when 2 None values are compared, they do not return True and so I'm printing "not in list" even though both df and variables hold the same (None) value.
Any thoughts on the best way around this issue would be really appreciated. Perhaps I have to convert the None values to a string that will return True when compared?
Solution 1:
The reason is that asserting equality for None
is not possible with a simple equality operator.
Consider the following example:
s = pd.Series([1, "a", None])
s == 1
Out[4]:
0 True
1 False
2 False
dtype: bool
s == "a"
Out[5]:
0 False
1 True
2 False
dtype: bool
s == None
Out[6]:
0 False
1 False
2 False
dtype: bool
s.isna()
Out[7]:
0 False
1 False
2 True
dtype: bool
So if you want to collect any potential equality between None
s, you need to check whether the values are na
.
If you have two series (eg two cols of a dataframe), you will need to create a union of results as:
d = pd.Series([2, "a", None])
s == d
Out[12]:
0 False
1 True
2 False
dtype: bool
(s == d) | (s.isna() & d.isna())
Out[13]:
0 False
1 True
2 True
dtype: bool
This means that the solution for your problem would be something like:
(df['Column1'] == variableA) & (df['Column2'] == variableB) & (df['Column3'] == variableC)
|
(df['Column1'].isna() & df['Column2'].isna() & df['Column3'].isna())