pandas or python equivalent of tidyr complete
I have data that looks like this:
library("tidyverse")
df <- tibble(user = c(1, 1, 2, 3, 3, 3), x = c("a", "b", "a", "a", "c", "d"), y = 1)
df
# user x y
# 1 1 a 1
# 2 1 b 1
# 3 2 a 1
# 4 3 a 1
# 5 3 c 1
# 6 3 d 1
Python format:
import pandas as pd
df = pd.DataFrame({'user':[1, 1, 2, 3, 3, 3], 'x':['a', 'b', 'a', 'a', 'c', 'd'], 'y':1})
I'd like to "complete" the data frame so that every user
has a record for every possible x
with the default y
fill set to 0.
This is somewhat trivial in R (tidyverse/tidyr):
df %>%
complete(nesting(user), x = c("a", "b", "c", "d"), fill = list(y = 0))
# user x y
# 1 1 a 1
# 2 1 b 1
# 3 1 c 0
# 4 1 d 0
# 5 2 a 1
# 6 2 b 0
# 7 2 c 0
# 8 2 d 0
# 9 3 a 1
# 10 3 b 0
# 11 3 c 1
# 12 3 d 1
Is there a complete
equivalent in pandas / python that will yield the same result?
Solution 1:
You can use reindex
by MultiIndex.from_product
:
df = df.set_index(['user','x'])
mux = pd.MultiIndex.from_product([df.index.levels[0], df.index.levels[1]],names=['user','x'])
df = df.reindex(mux, fill_value=0).reset_index()
print (df)
user x y
0 1 a 1
1 1 b 1
2 1 c 0
3 1 d 0
4 2 a 1
5 2 b 0
6 2 c 0
7 2 d 0
8 3 a 1
9 3 b 0
10 3 c 1
11 3 d 1
Or set_index
+ stack
+ unstack
:
df = df.set_index(['user','x'])['y'].unstack(fill_value=0).stack().reset_index(name='y')
print (df)
user x y
0 1 a 1
1 1 b 1
2 1 c 0
3 1 d 0
4 2 a 1
5 2 b 0
6 2 c 0
7 2 d 0
8 3 a 1
9 3 b 0
10 3 c 1
11 3 d 1