how to do a left,right and mid of a string in a pandas dataframe

in a pandas dataframe how can I apply a sort of excel left('state',2) to only take the first two letters. Ideally I want to learn how to use left,right and mid in a dataframe too. So need an equivalent and not a "trick" for this specific example.

data = {'state': ['Auckland', 'Otago', 'Wellington', 'Dunedin', 'Hamilton'],
'year': [2000, 2001, 2002, 2001, 2002],
'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
df = pd.DataFrame(data)

print df

     pop       state  year
 0  1.5    Auckland  2000
 1  1.7       Otago  2001
 2  3.6  Wellington  2002
 3  2.4     Dunedin  2001
 4  2.9    Hamilton  2002

I want to get this:

    pop       state     year  StateInitial
 0  1.5       Auckland    2000     Au
 1  1.7       Otago       2001     Ot
 2  3.6       Wellington  2002     We
 3  2.4       Dunedin     2001     Du
 4  2.9       Hamilton    2002     Ha

Solution 1:

First two letters for each value in a column:

>>> df['StateInitial'] = df['state'].str[:2]
>>> df
   pop       state  year StateInitial
0  1.5    Auckland  2000           Au
1  1.7       Otago  2001           Ot
2  3.6  Wellington  2002           We
3  2.4     Dunedin  2001           Du
4  2.9    Hamilton  2002           Ha

For last two that would be df['state'].str[-2:]. Don't know what exactly you want for middle, but you can apply arbitrary function to a column with apply method:

>>> df['state'].apply(lambda x: x[len(x)/2-1:len(x)/2+1])
0    kl
1    ta
2    in
3    ne
4    il

Solution 2:

With regards to the mid, probably a short cut code would be df['state'].str[3,5]

this will start from the 3rd character and give you the 3rd and 4th character of the string.

how to do a left,right and mid of a string in a pandas dataframe

Solution 1:

Solution 2:

Related

Recent Posts