Plotting categorical data with pandas and matplotlib
I have a data frame with categorical data:
colour direction
1 red up
2 blue up
3 green down
4 red left
5 red right
6 yellow down
7 blue down
I want to generate some graphs, like pie charts and histograms based on the categories. Is it possible without creating dummy numeric variables? Something like
df.plot(kind='hist')
You can simply use value_counts
on the series:
df['colour'].value_counts().plot(kind='bar')
You might find useful mosaic
plot from statsmodels. Which can also give statistical highlighting for the variances.
from statsmodels.graphics.mosaicplot import mosaic
plt.rcParams['font.size'] = 16.0
mosaic(df, ['direction', 'colour']);
But beware of the 0 sized cell - they will cause problems with labels.
See this answer for details
like this :
df.groupby('colour').size().plot(kind='bar')
You could also use countplot
from seaborn
. This package builds on pandas
to create a high level plotting interface. It gives you good styling and correct axis labels for free.
import pandas as pd
import seaborn as sns
sns.set()
df = pd.DataFrame({'colour': ['red', 'blue', 'green', 'red', 'red', 'yellow', 'blue'],
'direction': ['up', 'up', 'down', 'left', 'right', 'down', 'down']})
sns.countplot(df['colour'], color='gray')
It also supports coloring the bars in the right color with a little trick
sns.countplot(df['colour'],
palette={color: color for color in df['colour'].unique()})