Using .count() Pandas and assigning to y axis returns empty chart in Seaborn
I'm trying to do a simple .count()
for the number of times the word "pos" and "neg" appear so that I can apply this to the Y-axis of my barplot on Seaborn.
Here's my code:
df['score'] = df['compound'].apply(lambda x: 'pos' if x>= 0 else 'neg')
df['overall'] = df[(df.score == "pos") & (df.score == "neg")].count()
plt.figure(2)
chart_2 = sns.barplot(x='Category', y='overall', data=df)
When I run this, plt.figure(2) returns an empty chart. I've tried .sum()
which doesn't work and also returns an empty chart.
If I execute this, it'll return the overall total without breaking it down per Category
in the x-ais. For example, all Categories have a total of 58 which is the dataframe total.
df['overall'] = df['score'].count()
.value_counts()
also returns an empty chart.
I've run out of ideas on why this might be the case!
Thanks in advance.
As mentioned in the comments, (df.score == "pos") & (df.score == "neg")
combines via an AND
relation and will give False
in all cases. Using an OR
as in (df.score == "pos") | (df.score == "neg")
will give True in all cases. But, it won't differentiate between categories, so df['overall']
also will be 58 everywhere.
The easiest way to create a bar plot of counts is seaborn's sns.countplot()
. You can set x='Category'
to count each category. hue='score'
will split via score.
To directly create a barplot, and do the counting via pandas, you'd need something like df['overall'] = df['Category'].apply(lambda cat: (df['Category'] == cat).sum())
. Here, (df['Category'] == cat)
creates an array of boolean True
and False
values. When sum()
is called onto these, True
is considered 1
and False
as 0
, so sum()
will count the number of True
values.
Pandas' preferred way to count by category would be via groupby('Category')
and then take the size()
of each group.
Here is an example:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.DataFrame({'compound': [-20, -10, 100, 200, 300, -20, -10, 100, -10, 100, 200, 300],
'Category': ['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'c']})
df['score'] = df['compound'].apply(lambda x: 'pos' if x >= 0 else 'neg')
fig, (ax1, ax2, ax3, ax4) = plt.subplots(ncols=4, figsize=(14, 4))
sns.countplot(data=df, x='Category', hue='score', palette='rocket', ax=ax1)
ax1.set_title('Countplot: Score per category')
ax1.locator_params(axis='y', integer=True)
sns.countplot(data=df, x='Category', palette='turbo', ax=ax2)
ax2.set_title('Countplot: Overall sum per category')
df['overall'] = df['Category'].apply(lambda cat: (df['Category'] == cat).sum())
sns.barplot(data=df, x='Category', y='overall', palette='turbo', ax=ax3)
ax3.set_title('Barplot: Using the "overall" column')
df_counts = df.groupby('Category', as_index=False).size()
sns.barplot(data=df_counts, x='Category', y='size', palette='turbo', ax=ax4)
ax4.set_title('Barplot: Using groupby')
sns.despine()
plt.tight_layout()
plt.show()