How to reproduce this graph?
Here is my code ;
library(rvest)
library(dplyr)
library(tidyr)
col_link <- "https://ourworldindata.org/famines#famines-by-world-region-since-1860"
col_page <- read_html(col_link)
col_table <- col_page %>% html_nodes("table#tablepress-73") %>%
html_table() %>% . [[1]]
new_data <- col_table %>%
select(Year, Country, `Excess Mortality midpoint`)
new_data
I would like to arrange the years and countries in such a way that I can use them in a graph but I can't. My objective is to reproduce this graph :
My problem is that in the "year" column, some data last several years for a country. For example to show that the famine lasted from 1846 to 1852 in Ireland it says "1846-52" and this is a problem because I cannot use the data in this form for a graph.
Year Country `Excess Mortality midpoint`
<chr> <chr> <chr>
1 1846–52 Ireland 1,000,000
2 1860-1 India 2,000,000
3 1863-67 Cape Verde 30,000
4 1866-7 India 961,043
5 1868 Finland 100,000
6 1868-70 India 1,500,000
7 1870–1871 Persia (now Iran) 1,000,000
8 1876–79 Brazil 750,000
9 1876–79 India 7,176,346
10 1877–79 China 11,000,000
# ... with 67 more rows
Solution 1:
I think it's more of a question of data than R programming, you could try matching the year periods to the decades. However if a year range spans several decades the data should be 'split up' in some way (e.g. do a simple proportional split) to accommodate that. If the chart you linked to is made with this data, some assumptions had to made to adjust the data, without knowing those assumptions you won't be able to reproduce the chart.