Python - Turn all items in a Dataframe to strings

Solution 1:

You can use this:

df = df.astype(str)

out of curiosity I decided to see if there is any difference in efficiency between the accepted solution and mine.

The results are below:

example df:

df = pd.DataFrame([list(range(1000))], index=[0])

test df.astype:

%timeit df.astype(str) 
>> 100 loops, best of 3: 2.18 ms per loop

test df.applymap:

%timeit df.applymap(str)
1 loops, best of 3: 245 ms per loop

It seems df.astype is quite a lot faster :)

Solution 2:

You can use applymap method:

df = df.applymap(str)

Solution 3:

With pandas >= 1.0 there is now a dedicated string datatype:

You can convert your column to this pandas string datatype using .astype('string'):

df = df.astype('string')

This is different from using str which sets the pandas 'object' datatype:

df = df.astype(str)

You can see the difference in datatypes when you look at the info of the dataframe:

df = pd.DataFrame({
    'zipcode_str': [90210, 90211] ,
    'zipcode_string': [90210, 90211],
})

df['zipcode_str'] = df['zipcode_str'].astype(str)
df['zipcode_string'] = df['zipcode_str'].astype('string')

df.info()

# you can see that the first column has dtype object
# while the second column has the new dtype string
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   zipcode_str     2 non-null      object
 1   zipcode_string  2 non-null      string
dtypes: object(1), string(1)


From the docs:

The 'string' extension type solves several issues with object-dtype NumPy arrays:

1) You can accidentally store a mixture of strings and non-strings in an object dtype array. A StringArray can only store strings.

2) object dtype breaks dtype-specific operations like DataFrame.select_dtypes(). There isn’t a clear way to select just text while excluding non-text, but still object-dtype columns.

3) When reading code, the contents of an object dtype array is less clear than string.


Information about pandas 1.0 can be found here:
https://pandas.pydata.org/pandas-docs/version/1.0.0/whatsnew/v1.0.0.html

Solution 4:

This worked for me:

dt.applymap(lambda x: x[0] if type(x) is list else None)