Change value of a given column if another column contains the name of the given column
I am an intermediate learner and I have a pandas dataframe like below:
dfx=pd.DataFrame({'ID':['ID_1','ID_2','ID_3','ID_4'],'Extracts':[['QA,QB'], ['QB,QD'], ['QA,QD'], ['QC']],'QA':[0, 0, 0, 0],'QB':[0, 0, 0, 0],'QC':[0, 0, 0, 0],'QD':[0, 0, 0, 0]})
If any of the text in 'Extracts' column matches with the last four column names, I want the corresponding cells to be converted from 0 to 1 as shown in the following table: From this:
| ID | Extracts | QA | QB| QC|QD |
|----|:--------:|----|---|---|---|
|ID_1|['QA,QB'] |0 |0 |0 |0 |
|ID_2|['QB,QD'] |0 |0 |0 |0 |
|ID_3|['QA,QD'] |0 |0 |0 |0 |
|ID_4|['QC'] |0 |0 |0 |0 |
To this:
| ID | Extracts | QA | QB| QC|QD |
|----|:--------:|----|---|---|---|
|ID_1|['QA,QB'] |1 |1 |0 |0 |
|ID_2|['QB,QD'] |0 |1 |0 |1 |
|ID_3|['QA,QD'] |1 |0 |0 |1 |
|ID_4|['QC'] |0 |0 |1 |0 |
I have tried so far with the intent of looping through the columns:
for i in list(dfx.columns[2:6]):
print(i)
if dfx.Extracts.str.contains(i).any():
dfx.i=1
But cannot get this working. I would appreciate it if someone could guide me through this. Many thanks in advance.
Solution 1:
We can use indexing with the str
accessor to select the strings then use get_dummies
to create a dataframe of indicator variables, finally update
the original dataframe using the values from indicator dataframe
dfx.update(dfx['Extracts'].str[0].str.get_dummies(sep=','))
print(dfx)
ID Extracts QA QB QC QD
0 ID_1 [QA,QB] 1 1 0 0
1 ID_2 [QB,QD] 0 1 0 1
2 ID_3 [QA,QD] 1 0 0 1
3 ID_4 [QC] 0 0 1 0