Create new column based on a value of another column in a data-frame
A snippet of my current data-frame is:
|commentID | commentType |depth | parentID |
|:-------- |:-------------------------------:|
0 |58b61d1d | comment | 1.0 | 0.0 |
1 |58b6393b | userReply | 2.0 | 58b61d1d.0 |
2 |58b6556e | comment | 1.0 | 0.0 |
3 |58b657fa | userReply | 3.0 | 58b61d1d.0 |
4 |58b657fa | comment | 1.0 | 0.0 |
I want the data-frame to look like:
|commentID | commentType |depth | parentID | receiveAReply |
|:-------- |:--------------------------------|--------------:|
0 |58b61d1d | comment | 1.0 | 0.0 | 1 |
1 |58b6393b | userReply | 2.0 | 58b61d1d.0 | 0 |
2 |58b6556e | comment | 1.0 | 0.0 | 0 |
3 |58b657fa | userReply | 3.0 | 58b61d1d.0 | 0 |
4 |58b657fa | comment | 1.0 | 0.0 | 0 |
- An added column: receiveAReply
- Where if any comment receives a reply it is assigned 1. Even if a comment has multiple replies, it is still only assigned 1 or 0.
- all user replies receive 0 even if that reply has a reply, e.g depth = 3.0. Such that I only care about comments on the actual article and if they received a reply, not the number of replies or the replies to these replies.
- Therefore, I am focusing on user replies with depth 2.0 and what commentID's do their parentID's match.
I have the following code, however it is assigning the whole receiveAReply column Nan, where I try create another column 'replies' where they it has the parent ID's of depth 2.0. I tried to to assign 1 based on if any commentID's match these parent ID's:
df['replies'] = df.loc[df.depth == 2.0, ['parentID']]
df['receiveAReply'] = df.loc[df.commentID == df.replies, [1]]
IIUC your conditions, you just miss to extract the left part of parentID
column:
pid = df.loc[df['depth'] == 2, 'parentID'].str.split('.').str[0].values
df['receiveAReply'] = 0
df.loc[df['commentID'].isin(pid), 'receiveAReply'] = 1
Output:
>>> df
commentID commentType depth parentID receiveAReply
0 58b61d1d comment 1.0 0.0 1
1 58b6393b userReply 2.0 58b61d1d.0 0
2 58b6556e comment 1.0 0.0 0
3 58b657fa userReply 3.0 58b61d1d.0 0
4 58b657fa comment 1.0 0.0 0
This worked for me:
df['replies'] = df.loc[df.depth == 2.0, ['parentID']]
def test(x, y):
if x in y.values:
return 1
else:
return 0
df['getsReply'] = df['commentID'].apply(lambda x: test(x, df['replies']))