How to add annotation made of .pct_change() data to line plot
I have these data:
values = [["Arts & Humanities",19.00, 13.43, 7.21, 5.11, 2.64], ["Life Sciences &
Biomedicine", 64.41, 53.89, 45.01, 32.44, 14.82],
["Physical Sciences", 43.62, 37.26, 30.72, 19.71, 8.30],
["Social Sciences", 50.71, 42.32, 34.19, 26.85, 12.47], ["Technology", 52.48, 49.28, 36.65, 29.25, 14.77]]
I have made line plot of those data.
data = pd.DataFrame(values, columns = ["Research_categories",'2017', '2018', '2019', '2020', '2021'])
data.set_index('Research_categories', inplace=True)
df = data.T
plot = df.plot()
plt.subplots_adjust(right=0.869)
plt.show()
Now I need to add annotation to each point in year. This annotation is supposed to be made of percentage change. So I prepared this dataframe:
percentage_df = data.pct_change(axis='columns')
This is how this dataframe looks like:
2017 2018 2019 2020 2021
Research_categories
Arts & Humanities NaN -0.293158 -0.463142 -0.291262 -0.483366
Life Sciences & Biomedicine NaN -0.163329 -0.164780 -0.279271 -0.543157
Physical Sciences NaN -0.145805 -0.175523 -0.358398 -0.578894
Social Sciences NaN -0.165451 -0.192108 -0.214683 -0.535568
Technology NaN -0.060976 -0.256291 -0.201910 -0.495043
How can I take data from this dataframe and display them as an annotation in the plot?
I am very new to visualization in Python. So far it's very tricky part for me. I would be greatful for any help. Thank you very much for any help!
Matplotlib has a built-in annotation function where you simply need to specify the value of the annotation the coordinates you want it to be.
In your case, we just need to iterate over both dataframes to get the y-value of the data (from data
) and the value to be written on the graph (from percentage_df
).
for i, column in enumerate(data):
if not column == '2017': #no point plotting NANs
for val1, val2 in zip(data[column], percentage_df[column]):
plot.annotate(
text = val2,
xy = (i, val1), #must use counter as data is plotted as categorical
)
Note that as your data is technically categorical (the years are strings not numbers), we need to use enumerate to get a counter which gives us an x-position for the annotation.
This gives the following graph:
which satisfies your criteria but looks pretty bad. So let's clean it up a little by making it bigger and rounding the numbers to 2 decimal places.
Full code:
import pandas as pd
import matplotlib.pyplot as plt
values = [["Arts & Humanities",19.00, 13.43, 7.21, 5.11, 2.64],
["Life Sciences & Biomedicine", 64.41, 53.89, 45.01, 32.44, 14.82],
["Physical Sciences", 43.62, 37.26, 30.72, 19.71, 8.30],
["Social Sciences", 50.71, 42.32, 34.19, 26.85, 12.47],
["Technology", 52.48, 49.28, 36.65, 29.25, 14.77]
]
data = pd.DataFrame(values, columns = ["Research_categories",'2017', '2018', '2019', '2020', '2021'])
data.set_index('Research_categories', inplace=True)
df = data.T
fig, ax = plt.subplots(1,1, figsize = (8,5), dpi = 150)
df.plot(ax=ax)
percentage_df = data.pct_change(axis='columns')
for i, column in enumerate(data):
if not column == '2017': #no point plotting NANs
for val1, val2 in zip(data[column], percentage_df[column]):
ax.annotate(
text = round(val2, 2),
xy = (i, val1), #must use counter as data is plotted as categorical
)
plt.show()