How to add annotation made of .pct_change() data to line plot

I have these data:

values = [["Arts & Humanities",19.00, 13.43, 7.21, 5.11, 2.64], ["Life Sciences & 
Biomedicine", 64.41, 53.89, 45.01, 32.44, 14.82],
["Physical Sciences", 43.62, 37.26,  30.72,  19.71, 8.30],
["Social Sciences", 50.71, 42.32, 34.19, 26.85, 12.47], ["Technology", 52.48, 49.28, 36.65, 29.25, 14.77]]

I have made line plot of those data.

data = pd.DataFrame(values, columns = ["Research_categories",'2017', '2018', '2019', '2020', '2021'])
data.set_index('Research_categories', inplace=True)
df = data.T
plot = df.plot()
plt.subplots_adjust(right=0.869)

plt.show()

plot

Now I need to add annotation to each point in year. This annotation is supposed to be made of percentage change. So I prepared this dataframe:

percentage_df = data.pct_change(axis='columns')

This is how this dataframe looks like:

                             2017      2018      2019      2020      2021
Research_categories
Arts & Humanities             NaN -0.293158 -0.463142 -0.291262 -0.483366
Life Sciences & Biomedicine   NaN -0.163329 -0.164780 -0.279271 -0.543157
Physical Sciences             NaN -0.145805 -0.175523 -0.358398 -0.578894
Social Sciences               NaN -0.165451 -0.192108 -0.214683 -0.535568
Technology                    NaN -0.060976 -0.256291 -0.201910 -0.495043

How can I take data from this dataframe and display them as an annotation in the plot?

I am very new to visualization in Python. So far it's very tricky part for me. I would be greatful for any help. Thank you very much for any help!

Matplotlib has a built-in annotation function where you simply need to specify the value of the annotation the coordinates you want it to be.

In your case, we just need to iterate over both dataframes to get the y-value of the data (from data) and the value to be written on the graph (from percentage_df).

for i, column in enumerate(data):
    if not column == '2017': #no point plotting NANs
        for val1, val2 in zip(data[column], percentage_df[column]):
            plot.annotate(
                text = val2, 
                xy = (i, val1), #must use counter as data is plotted as categorical 
                )

Note that as your data is technically categorical (the years are strings not numbers), we need to use enumerate to get a counter which gives us an x-position for the annotation.

This gives the following graph:

raw answer

which satisfies your criteria but looks pretty bad. So let's clean it up a little by making it bigger and rounding the numbers to 2 decimal places.

Cleaned up

Full code:

import pandas as pd
import matplotlib.pyplot as plt

values = [["Arts & Humanities",19.00, 13.43, 7.21, 5.11, 2.64], 
          ["Life Sciences & Biomedicine", 64.41, 53.89, 45.01, 32.44, 14.82],
          ["Physical Sciences", 43.62, 37.26,  30.72,  19.71, 8.30],
          ["Social Sciences", 50.71, 42.32, 34.19, 26.85, 12.47], 
          ["Technology", 52.48, 49.28, 36.65, 29.25, 14.77]
         ]

data = pd.DataFrame(values, columns = ["Research_categories",'2017', '2018', '2019', '2020', '2021'])
data.set_index('Research_categories', inplace=True)
df = data.T

fig, ax = plt.subplots(1,1, figsize = (8,5), dpi = 150)

df.plot(ax=ax)

percentage_df = data.pct_change(axis='columns')

for i, column in enumerate(data):
    if not column == '2017': #no point plotting NANs
        for val1, val2 in zip(data[column], percentage_df[column]):
            ax.annotate(
                text = round(val2, 2), 
                xy = (i, val1), #must use counter as data is plotted as categorical 
                )

plt.show()

How to add annotation made of .pct_change() data to line plot

Related

Recent Posts