print the unique values in every column in a pandas dataframe
Solution 1:
It can be written more concisely like this:
for col in df:
print(df[col].unique())
Generally, you can access a column of the DataFrame through indexing using the []
operator (e.g. df['col']
), or through attribute (e.g. df.col
).
Attribute accessing makes the code a bit more concise when the target column name is known beforehand, but has several caveats -- for example, it does not work when the column name is not a valid Python identifier (e.g. df.123
), or clashes with the built-in DataFrame attribute (e.g. df.index
). On the other hand, the []
notation should always work.
Solution 2:
Most upvoted answer is a loop solution, hence adding a one line solution using pandas apply() method and lambda function.
print(df.apply(lambda col: col.unique()))
Solution 3:
This will get the unique values in proper format:
pd.Series({col:df[col].unique() for col in df})
Solution 4:
If you're trying to create multiple separate dataframes as mentioned in your comments, create a dictionary of dataframes:
df_dict = dict(zip([i for i in df.columns] , [pd.DataFrame(df[i].unique(), columns=[i]) for i in df.columns]))
Then you can access any dataframe easily using the name of the column:
df_dict[column name]
Solution 5:
We can make this even more concise:
df.describe(include='all').loc['unique', :]
Pandas describe gives a few key statistics about each column, but we can just grab the 'unique' statistic and leave it at that.
Note that this will give a unique count of NaN
for numeric columns - if you want to include those columns as well, you can do something like this:
df.astype('object').describe(include='all').loc['unique', :]