Using .count() Pandas and assigning to y axis returns empty chart in Seaborn

I'm trying to do a simple .count() for the number of times the word "pos" and "neg" appear so that I can apply this to the Y-axis of my barplot on Seaborn.

Here's my code:

df['score'] = df['compound'].apply(lambda x: 'pos' if x>= 0 else 'neg')
df['overall'] = df[(df.score == "pos") & (df.score == "neg")].count()

plt.figure(2)
chart_2 = sns.barplot(x='Category', y='overall', data=df)

When I run this, plt.figure(2) returns an empty chart. I've tried .sum() which doesn't work and also returns an empty chart.

If I execute this, it'll return the overall total without breaking it down per Category in the x-ais. For example, all Categories have a total of 58 which is the dataframe total.

df['overall'] = df['score'].count()

.value_counts() also returns an empty chart.

I've run out of ideas on why this might be the case!

Thanks in advance.

As mentioned in the comments, (df.score == "pos") & (df.score == "neg") combines via an AND relation and will give False in all cases. Using an OR as in (df.score == "pos") | (df.score == "neg") will give True in all cases. But, it won't differentiate between categories, so df['overall'] also will be 58 everywhere.

The easiest way to create a bar plot of counts is seaborn's sns.countplot(). You can set x='Category' to count each category. hue='score' will split via score.

To directly create a barplot, and do the counting via pandas, you'd need something like df['overall'] = df['Category'].apply(lambda cat: (df['Category'] == cat).sum()). Here, (df['Category'] == cat) creates an array of boolean True and False values. When sum() is called onto these, True is considered 1 and False as 0, so sum() will count the number of True values.

Pandas' preferred way to count by category would be via groupby('Category') and then take the size() of each group.

Here is an example:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

df = pd.DataFrame({'compound': [-20, -10, 100, 200, 300, -20, -10, 100, -10, 100, 200, 300],
                   'Category': ['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'c']})
df['score'] = df['compound'].apply(lambda x: 'pos' if x >= 0 else 'neg')

fig, (ax1, ax2, ax3, ax4) = plt.subplots(ncols=4, figsize=(14, 4))

sns.countplot(data=df, x='Category', hue='score', palette='rocket', ax=ax1)
ax1.set_title('Countplot: Score per category')
ax1.locator_params(axis='y', integer=True)

sns.countplot(data=df, x='Category', palette='turbo', ax=ax2)
ax2.set_title('Countplot: Overall sum per category')

df['overall'] = df['Category'].apply(lambda cat: (df['Category'] == cat).sum())
sns.barplot(data=df, x='Category', y='overall', palette='turbo', ax=ax3)
ax3.set_title('Barplot: Using the "overall" column')

df_counts = df.groupby('Category', as_index=False).size()
sns.barplot(data=df_counts, x='Category', y='size', palette='turbo', ax=ax4)
ax4.set_title('Barplot: Using groupby')

sns.despine()
plt.tight_layout()
plt.show()

sns.countplot vs sns.barplot

Using .count() Pandas and assigning to y axis returns empty chart in Seaborn

Related

Recent Posts