Pandas unstack problems: ValueError: Index contains duplicate entries, cannot reshape
Solution 1:
Here's an example DataFrame which show this, it has duplicate values with the same index. The question is, do you want to aggregate these or keep them as multiple rows?
In [11]: df
Out[11]:
0 1 2 3
0 1 2 a 16.86
1 1 2 a 17.18
2 1 4 a 17.03
3 2 5 b 17.28
In [12]: df.pivot_table(values=3, index=[0, 1], columns=2, aggfunc='mean') # desired?
Out[12]:
2 a b
0 1
1 2 17.02 NaN
4 17.03 NaN
2 5 NaN 17.28
In [13]: df1 = df.set_index([0, 1, 2])
In [14]: df1
Out[14]:
3
0 1 2
1 2 a 16.86
a 17.18
4 a 17.03
2 5 b 17.28
In [15]: df1.unstack(2)
ValueError: Index contains duplicate entries, cannot reshape
One solution is to reset_index
(and get back to df
) and use pivot_table
.
In [16]: df1.reset_index().pivot_table(values=3, index=[0, 1], columns=2, aggfunc='mean')
Out[16]:
2 a b
0 1
1 2 17.02 NaN
4 17.03 NaN
2 5 NaN 17.28
Another option (if you don't want to aggregate) is to append a dummy level, unstack it, then drop the dummy level...
Solution 2:
There's a far more simpler solution to tackle this.
The reason why you get ValueError: Index contains duplicate entries, cannot reshape
is because, once you unstack "Location
", then the remaining index columns "id
" and "date
" combinations are no longer unique.
You can avoid this by retaining the default index column (row #) and while setting the index using "id
", "date
" and "location
", add it in "append
" mode instead of the default overwrite mode.
So use,
e.set_index(['id', 'date', 'location'], append=True)
Once this is done, your index columns will still have the default index along with the set indexes. And unstack
will work.
Let me know how it works out.
Solution 3:
I had such problem. In my case problem was in data - my column 'information' contained 1 unique value and it caused error
UPDATE: to correct work 'pivot' pairs (id_user,information) cannot have duplicates
It works:
df2 = pd.DataFrame({'id_user':[1,2,3,4,4,5,5],
'information':['phon','phon','phone','phone1','phone','phone1','phone'],
'value': [1, '01.01.00', '01.02.00', 2, '01.03.00', 3, '01.04.00']})
df2.pivot(index='id_user', columns='information', values='value')
it doesn't work:
df2 = pd.DataFrame({'id_user':[1,2,3,4,4,5,5],
'information':['phone','phone','phone','phone','phone','phone','phone'],
'value': [1, '01.01.00', '01.02.00', 2, '01.03.00', 3, '01.04.00']})
df2.pivot(index='id_user', columns='information', values='value')
source: https://stackoverflow.com/a/37021196/6088984