Replacing nan with None in pandas dataframe MultiIndex
I am trying to replace nan with None in a pandas dataframe MultiIndex. It seems like None is converted to nan in MultiIndex (but not in other index types).
Following does not work (Taken from the question Replace NaN in DataFrame index)
df = pd.DataFrame([['a', True, 1], ['b', True, 2], ['c', False, 3], ['d', None, 4]], columns=['c1', 'c2', 'c3'])
df.set_index(['c1','c2'], inplace=True)
df.index = pd.MultiIndex.from_frame(df.index.to_frame().fillna(np.nan).replace([np.nan], [None]))
df
c3
c1 c2
a True 1
b True 2
c False 3
d NaN 4
type(df.index[3][1])
<class 'float'>
Neither does
index_tuples = [tuple(row) for row in df.index.to_frame().fillna(np.nan).replace([np.nan], [None]).values]
pd.MultiIndex.from_tuples(index_tuples)
MultiIndex([('a', True),
('b', True),
('c', False),
('d', nan)],
)
type(df.index[3][1])
<class 'float'>
It seems None is converted to NaN in MultiIndex.
PS. It works for other index types:
df = pd.DataFrame([['a', True, 1], ['b', True, 2], ['c', False, 3], ['d', None, 4]], columns=['c1', 'c2', 'c3'])
df.set_index('c2', inplace=True)
>>> df
c1 c3
c2
True a 1
True b 2
False c 3
NaN d 4
>>> df.index = df.index.fillna(value=np.nan).to_series().replace([np.nan], [None])
>>> df
c1 c3
c2
True a 1
True b 2
False c 3
NaN d 4
>>> type(df.index[3])
<class 'NoneType'>
>>>
Solution 1:
The only way I managed to do it is by manipulating the numpy array directly. Seems like any assignment of None
values by a MultiIndex
in pandas results in conversion to NaN
import pandas as pd
import numpy as np
df = pd.DataFrame([['a', True, 1], ['b', True, 2], ['c', False, 3], ['d', None, 4]], columns=['c1', 'c2', 'c3'])
df.set_index(['c1','c2'], inplace=True)
def replace_nan(x):
new_x = []
for v in x:
try:
if np.isnan(v):
new_x.append(None)
else:
new_x.append(v)
except TypeError:
new_x.append(v)
return tuple(new_x)
print('Before:\n', df.index)
idx = df.index.values
idx[:] = np.vectorize(replace_nan, otypes=['object'])(idx) # Replace values in np.array
print('After:\n', df.index)
Result:
Before:
MultiIndex([('a', True),
('b', True),
('c', False),
('d', nan)],
names=['c1', 'c2'])
After:
MultiIndex([('a', True),
('b', True),
('c', False),
('d', None)],
names=['c1', 'c2'])