How to convert column with dtype as object to string in Pandas Dataframe [duplicate]

When I read a csv file to pandas dataframe, each column is cast to its own datatypes. I have a column that was converted to an object. I want to perform string operations for this column such as splitting the values and creating a list. But no such operation is possible because its dtype is object. Can anyone please let me know the way to convert all the items of a column to strings instead of objects?

I tried several ways but nothing worked. I used astype, str(), to_string etc.

a=lambda x: str(x).split(',')
df['column'].apply(a)

or

df['column'].astype(str)

Solution 1:

since strings data types have variable length, it is by default stored as object dtype. If you want to store them as string type, you can do something like this.

df['column'] = df['column'].astype('|S80') #where the max length is set at 80 bytes,

or alternatively

df['column'] = df['column'].astype('|S') # which will by default set the length to the max len it encounters

Solution 2:

Did you try assigning it back to the column?

df['column'] = df['column'].astype('str') 

Referring to this question, the pandas dataframe stores the pointers to the strings and hence it is of type 'object'. As per the docs ,You could try:

df['column_new'] = df['column'].str.split(',') 

Solution 3:

Not answering the question directly, but it might help someone else.

I have a column called Volume, having both - (invalid/NaN) and numbers formatted with ,

df['Volume'] = df['Volume'].astype('str')
df['Volume'] = df['Volume'].str.replace(',', '')
df['Volume'] = pd.to_numeric(df['Volume'], errors='coerce')

Casting to string is required for it to apply to str.replace

pandas.Series.str.replace
pandas.to_numeric

Solution 4:

You could try using df['column'].str. and then use any string function. Pandas documentation includes those like split