Better way to plot Gender count using Python
I am making a graph to plot Gender count for the time series data that look like following data. Each row represent hourly data of each respective patient.
HR | SBP | DBP | Sepsis | Gender | P_ID |
---|---|---|---|---|---|
92 | 120 | 80 | 0 | 0 | 0 |
98 | 115 | 85 | 0 | 0 | 0 |
93 | 125 | 75 | 1 | 1 | 1 |
95 | 130 | 90 | 1 | 1 | 1 |
102 | 120 | 80 | 0 | 0 | 2 |
109 | 115 | 75 | 0 | 0 | 2 |
94 | 135 | 100 | 0 | 0 | 2 |
97 | 100 | 70 | 1 | 1 | 3 |
85 | 120 | 80 | 1 | 1 | 3 |
88 | 115 | 75 | 1 | 1 | 3 |
93 | 125 | 85 | 1 | 1 | 3 |
78 | 130 | 90 | 1 | 0 | 4 |
115 | 140 | 110 | 1 | 0 | 4 |
102 | 120 | 80 | 0 | 1 | 5 |
98 | 140 | 110 | 0 | 1 | 5 |
This is my code:
gender = df_n['Gender'].value_counts()
plt.figure(figsize=(7, 6))
ax = gender.plot(kind='bar', rot=0, color="c")
ax.set_title("Bar Graph of Gender", y = 1)
ax.set_xlabel('Gender')
ax.set_ylabel('Number of People')
ax.set_xticklabels(('Male', 'Female'))
for rect in ax.patches:
y_value = rect.get_height()
x_value = rect.get_x() + rect.get_width() / 2
space = 1
label = format(y_value)
ax.annotate(label, (x_value, y_value), xytext=(0, space), textcoords="offset points", ha='center', va='bottom')
plt.show()
Now what is happening is the code is calculating total number of instances (0: Male, 1: Female) and plotting it. But I want to plot the total males and females, not the total number of 0s and 1s, as the Same patient is having multiple rows of data (as per P_ID). Like how many patients are male and how many are female?
Can someone help me out? I guess maybe sns.countplot can be used. But I don't know how.
Thanks for helping me out >.<
__________ Udpate ________________
How I can group those Genders that are sepsis (1) or no sepsis (0)?
__________ Update 2 ___________
So, I got the total actual count of Male and Female, thanks to @Shaido.
In the whole dataset, there are only 2932 septic patients. Rest are non-septic. This is what I got from @JohanC answer.
Now, the problem is that as there are only 2932 septic patients, by looking at the graph, it is assumed that only 426 (251 Male) and (175 Female) are septic patients (out of 2932), rest are non-septic. But this is not true. Please help. Thanks.
Solution 1:
I have a working example for selecting the unique IDS, it looks ugly so there is probably a better way, but it works...
import pandas as pd
# example of data:
data = {'gender': [0, 0, 1, 1, 1, 1, 0, 0], 'id': [1, 1, 2, 2, 3, 3, 4, 4]}
df = pd.DataFrame(data)
# get all unique ids:
ids = set(df.id)
# Go over all id, get first element of gender:
g = [list(df[df['id'] == i]['gender'])[0] for i in ids]
# count genders, laze way using pandas since the rest of the code also assumes a dataframe for plotting:
gender_counts = pd.DataFrame(g).value_counts()
# from here you can use your plot function.
# Or Counter
from collections import Counter
gender_counts = Counter(g)
# You have to create another method for plotting the gender.