How to split a dataframe column into 2 new columns, by slicing the all strings before the last item and last item
I have a dataframe that has a column which contains addresses. I would like to split the addresses so that the ending are in a column Ending and the strings before the the ending item are in a separate column Beginning. The address vary in length eg:
- Main Street
- Jon Smith Close
- The Rovers Avenue
After searching different resources I came up with the following
new_address_df['begining'], new_address_df['ending'] = new_address_df['street'].str.split().str[:-1].apply(lambda x: ' '.join(map(str, x))), new_address_df['street'].str.split().str[-1]
The code works but I am not sure if its the right way to write the code in python. Another option would have been to convert to list, modify the data in list form and then convert back to dataframe. I guess this might not be the best approach.
Is there a way to improve the above code if its not pythonic.
There are certainly alot of ways of doing this :) I would go for using str and rpartition. rpartition splits your string in 3 components, the remaining part, the partition string, and the part after remaining and the partition string. If you just take the first and remaining part you should be done.
df[["begining", "ending"]]=df.street.str.rpartition(" ")[[0,2]]
You might use regular expression for this as follows
import pandas as pd
df = pd.DataFrame({"street":["Main Street","Jon Smith Close","The Rovers Avenue"]})
df2 = df.street.str.extract(r"(?P<Beginning>.+)\s(?P<Ending>\S+)")
df = pd.concat([df,df2],axis=1)
print(df)
output
street Beginning Ending
0 Main Street Main Street
1 Jon Smith Close Jon Smith Close
2 The Rovers Avenue The Rovers Avenue
Explanation: I used named capturing group which result in pandas.DataFrame
with such named columns, which I then concat
with original df
with axis=1
. In pattern I used group are sheared by single whitespace (\s
), in group Beginning
any character is allowed in group Ending
only non-whitespace (\S
) characters are allowed.