print the unique values in every column in a pandas dataframe

Solution 1:

It can be written more concisely like this:

for col in df:
    print(df[col].unique())

Generally, you can access a column of the DataFrame through indexing using the [] operator (e.g. df['col']), or through attribute (e.g. df.col).

Attribute accessing makes the code a bit more concise when the target column name is known beforehand, but has several caveats -- for example, it does not work when the column name is not a valid Python identifier (e.g. df.123), or clashes with the built-in DataFrame attribute (e.g. df.index). On the other hand, the [] notation should always work.

Solution 2:

Most upvoted answer is a loop solution, hence adding a one line solution using pandas apply() method and lambda function.

print(df.apply(lambda col: col.unique()))

Solution 3:

This will get the unique values in proper format:

pd.Series({col:df[col].unique() for col in df})

Solution 4:

If you're trying to create multiple separate dataframes as mentioned in your comments, create a dictionary of dataframes:

df_dict = dict(zip([i for i in df.columns] , [pd.DataFrame(df[i].unique(), columns=[i]) for i in df.columns]))

Then you can access any dataframe easily using the name of the column:

df_dict[column name]

Solution 5:

We can make this even more concise:

df.describe(include='all').loc['unique', :]

Pandas describe gives a few key statistics about each column, but we can just grab the 'unique' statistic and leave it at that.

Note that this will give a unique count of NaN for numeric columns - if you want to include those columns as well, you can do something like this:

df.astype('object').describe(include='all').loc['unique', :]