Plot an histogram with y-axis as percentage (using FuncFormatter?)
Other answers seem utterly complicated. A histogram which shows the proportion instead of the absolute amount can easily produced by weighting the data with 1/n
, where n
is the number of datapoints.
Then a PercentFormatter
can be used to show the proportion (e.g. 0.45
) as percentage (45%
).
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
data = [1000, 1000, 5000, 3000, 4000, 16000, 2000]
plt.hist(data, weights=np.ones(len(data)) / len(data))
plt.gca().yaxis.set_major_formatter(PercentFormatter(1))
plt.show()
Here we see that three of the 7 values are in the first bin, i.e. 3/7=43%.
You can calculate the percentages yourself, then plot them as a bar chart. This requires you to use numpy.histogram
(which matplotlib uses "under the hood" anyway). You can then adjust the y tick labels:
import matplotlib.pyplot as plt
import numpy as np
f, ax = plt.subplots(1, 1, figsize=(10,5))
data = [1000, 1000, 5000, 3000, 4000, 16000, 2000]
heights, bins = np.histogram(data, bins = len(list(set(data))))
percent = [i/sum(heights)*100 for i in heights]
ax.bar(bins[:-1], percent, width=2500, align="edge")
vals = ax.get_yticks()
ax.set_yticklabels(['%1.2f%%' %i for i in vals])
plt.show()
I think the simplest way is to use seaborn which is a layer on matplotlib. Note that you can still use plt.subplots()
, figsize()
, ax
, and fig
to customize your plot.
import seaborn as sns
And using the following code:
sns.displot(data, stat='probability'))
Also, sns.displot
has so many parameters that allow for very complex and informative graphs very easily. They can be found here: displot Documentation
Simply set density to true, the weights will be implicitly normalized.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
data = [1000, 1000, 5000, 3000, 4000, 16000, 2000]
plt.hist(data, density=True)
plt.gca().yaxis.set_major_formatter(PercentFormatter(1))
plt.show()
You can use functools.partial
to avoid using global
s in your example.
Just add n
to function parameters:
def to_percent(y, position, n):
s = str(round(100 * y / n, 3))
if matplotlib.rcParams['text.usetex']:
return s + r'$\%$'
return s + '%'
and then create a partial function of two arguments that you can pass to FuncFormatter
:
percent_formatter = partial(to_percent,
n=len(data))
formatter = FuncFormatter(percent_formatter)
Full code:
from functools import partial
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
data = [1000, 1000, 5000, 3000, 4000, 16000, 2000]
def to_percent(y, position, n):
s = str(round(100 * y / n, 3))
if matplotlib.rcParams['text.usetex']:
return s + r'$\%$'
return s + '%'
def plotting_hist(data):
f, ax = plt.subplots(figsize=(10, 5))
ax.hist(data,
bins=len(set(data)),
rwidth=1)
percent_formatter = partial(to_percent,
n=len(data))
formatter = FuncFormatter(percent_formatter)
plt.gca().yaxis.set_major_formatter(formatter)
plt.show()
plotting_hist(data)
gives: