Plot an histogram with y-axis as percentage (using FuncFormatter?)

Other answers seem utterly complicated. A histogram which shows the proportion instead of the absolute amount can easily produced by weighting the data with 1/n, where n is the number of datapoints.

Then a PercentFormatter can be used to show the proportion (e.g. 0.45) as percentage (45%).

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter

data = [1000, 1000, 5000, 3000, 4000, 16000, 2000]

plt.hist(data, weights=np.ones(len(data)) / len(data))

plt.gca().yaxis.set_major_formatter(PercentFormatter(1))
plt.show()

enter image description here

Here we see that three of the 7 values are in the first bin, i.e. 3/7=43%.


You can calculate the percentages yourself, then plot them as a bar chart. This requires you to use numpy.histogram (which matplotlib uses "under the hood" anyway). You can then adjust the y tick labels:

import matplotlib.pyplot as plt
import numpy as np

f, ax = plt.subplots(1, 1, figsize=(10,5))
data = [1000, 1000, 5000, 3000, 4000, 16000, 2000]

heights, bins = np.histogram(data, bins = len(list(set(data))))

percent = [i/sum(heights)*100 for i in heights]

ax.bar(bins[:-1], percent, width=2500, align="edge")
vals = ax.get_yticks()
ax.set_yticklabels(['%1.2f%%' %i for i in vals])

plt.show()

enter image description here


I think the simplest way is to use seaborn which is a layer on matplotlib. Note that you can still use plt.subplots(), figsize(), ax, and fig to customize your plot.

import seaborn as sns

And using the following code:

sns.displot(data, stat='probability'))

Also, sns.displot has so many parameters that allow for very complex and informative graphs very easily. They can be found here: displot Documentation


Simply set density to true, the weights will be implicitly normalized.

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter

data = [1000, 1000, 5000, 3000, 4000, 16000, 2000]

plt.hist(data, density=True)

plt.gca().yaxis.set_major_formatter(PercentFormatter(1))
plt.show()

You can use functools.partial to avoid using globals in your example.

Just add n to function parameters:

def to_percent(y, position, n):
    s = str(round(100 * y / n, 3))

    if matplotlib.rcParams['text.usetex']:
        return s + r'$\%$'

    return s + '%'

and then create a partial function of two arguments that you can pass to FuncFormatter:

percent_formatter = partial(to_percent,
                            n=len(data))
formatter = FuncFormatter(percent_formatter)

Full code:

from functools import partial

import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter

data = [1000, 1000, 5000, 3000, 4000, 16000, 2000]


def to_percent(y, position, n):
    s = str(round(100 * y / n, 3))

    if matplotlib.rcParams['text.usetex']:
        return s + r'$\%$'

    return s + '%'


def plotting_hist(data):    
    f, ax = plt.subplots(figsize=(10, 5))
    ax.hist(data, 
            bins=len(set(data)), 
            rwidth=1)

    percent_formatter = partial(to_percent,
                                n=len(data))
    formatter = FuncFormatter(percent_formatter)
    plt.gca().yaxis.set_major_formatter(formatter)

    plt.show()


plotting_hist(data)

gives:

enter image description here