How to add the total of a specific set of rows of a column, and then add this to another dataset?

Solution 1:

You can achieve the result using dplyr library in R. First, you'll need to group the data using the location variable and then summarise the column of new_cases.

The code will look like this:

df <- df %>%
  group_by(location) %>%
  summarise(totalCases = sum(new_cases))
df

The output will look like this:

# A tibble: 238 x 2
   location            totalCases
   <chr>                    <dbl>
 1 Afghanistan             158602
 2 Africa                10230722
 3 Albania                     NA
 4 Algeria                 224383
 5 Andorra                     NA
 6 Angola                   92581
 7 Anguilla                    NA
 8 Antigua and Barbuda         NA
 9 Argentina                   NA
10 Armenia                     NA
# ... with 228 more rows

Note: This will give you totalCases for every location.

To get it for a specific location, you can use filter.

df2 <- df %>%
  filter(location == "Afghanistan") %>%
  group_by(location) %>%
  summarise(totalCases = sum(new_cases))
df2

Output:

# A tibble: 1 x 2
  location    totalCases
  <chr>            <dbl>
1 Afghanistan     158602

Since it is stored in a new df called df2, you can merge the data with another df of your choice. You can find the documentation here.