Create new column based on a value of another column in a data-frame

A snippet of my current data-frame is:

     |commentID | commentType |depth | parentID   |                                    
     |:-------- |:-------------------------------:| 
0    |58b61d1d  | comment     | 1.0  | 0.0        |
1    |58b6393b  | userReply   | 2.0  | 58b61d1d.0 |     
2    |58b6556e  | comment     | 1.0  | 0.0        |
3    |58b657fa  | userReply   | 3.0  | 58b61d1d.0 |
4    |58b657fa  | comment     | 1.0  | 0.0        |

I want the data-frame to look like:

     |commentID | commentType |depth | parentID   | receiveAReply |                                  
     |:-------- |:--------------------------------|--------------:| 
0    |58b61d1d  | comment     | 1.0  | 0.0        | 1             |
1    |58b6393b  | userReply   | 2.0  | 58b61d1d.0 | 0             |
2    |58b6556e  | comment     | 1.0  | 0.0        | 0             |
3    |58b657fa  | userReply   | 3.0  | 58b61d1d.0 | 0             |
4    |58b657fa  | comment     | 1.0  | 0.0        | 0             |
  • An added column: receiveAReply
  • Where if any comment receives a reply it is assigned 1. Even if a comment has multiple replies, it is still only assigned 1 or 0.
  • all user replies receive 0 even if that reply has a reply, e.g depth = 3.0. Such that I only care about comments on the actual article and if they received a reply, not the number of replies or the replies to these replies.
  • Therefore, I am focusing on user replies with depth 2.0 and what commentID's do their parentID's match.

I have the following code, however it is assigning the whole receiveAReply column Nan, where I try create another column 'replies' where they it has the parent ID's of depth 2.0. I tried to to assign 1 based on if any commentID's match these parent ID's:


df['replies'] = df.loc[df.depth == 2.0, ['parentID']]
df['receiveAReply'] = df.loc[df.commentID == df.replies, [1]]

IIUC your conditions, you just miss to extract the left part of parentID column:

pid = df.loc[df['depth'] == 2, 'parentID'].str.split('.').str[0].values

df['receiveAReply'] = 0
df.loc[df['commentID'].isin(pid), 'receiveAReply'] = 1

Output:

>>> df
  commentID commentType  depth    parentID  receiveAReply
0  58b61d1d     comment    1.0         0.0              1
1  58b6393b   userReply    2.0  58b61d1d.0              0
2  58b6556e     comment    1.0         0.0              0
3  58b657fa   userReply    3.0  58b61d1d.0              0
4  58b657fa     comment    1.0         0.0              0

This worked for me:

df['replies'] = df.loc[df.depth == 2.0, ['parentID']]

def test(x, y):
    if x in y.values:
        return 1
    else:
        return 0


df['getsReply'] = df['commentID'].apply(lambda x: test(x, df['replies']))