Using str_split and removing duplicate values from the same record in R

You can do this with stringr::str_split to make a list-column of genres. genre will become a list of character vectors, which you can then unnest, then take distinct observations.

library(tidyverse)

books %>%
  mutate(genre = str_split(genre, ", ")) %>%
  unnest(genre) %>%
  distinct()
#> # A tibble: 12 x 2
#>    title                 genre             
#>    <chr>                 <chr>             
#>  1 Harry Potter 1        Fantasy           
#>  2 Harry Potter 1        Young Adult       
#>  3 Harry Potter 1        Magic             
#>  4 To Kill A Mockingbird Classics          
#>  5 To Kill A Mockingbird Fiction           
#>  6 To Kill A Mockingbird Historical        
#>  7 To Kill A Mockingbird Historical Fiction
#>  8 To Kill A Mockingbird Academic          
#>  9 The Hunger Games 1    Young Adult       
#> 10 The Hunger Games 1    Fiction           
#> 11 The Hunger Games 1    Science Fiction   
#> 12 The Hunger Games 1    Dystopia

A shortcut here that I always forget about is separate_rows, which does the splitting and unnesting in one step:

books %>%
  separate_rows(genre, sep = ", ") %>%
  distinct()

is equivalent to the previous block.

To get this to a wide format, you can use tidyr::spread. To make column names "genre1", "genre2", etc. dynamically, I grouped by title and then numbered the unique genres for each title. That way, you don't need to know how many genre columns you need, like you would if you used tidyr::separate to split the column instead.

books %>%
  mutate(genre = str_split(genre, ", ")) %>%
  unnest(genre) %>%
  distinct() %>%
  group_by(title) %>%
  mutate(num = row_number() %>% paste0("genre", .)) %>%
  spread(key = num, value = genre)
#> # A tibble: 3 x 6
#> # Groups:   title [3]
#>   title                 genre1      genre2      genre3    genre4    genre5
#>   <chr>                 <chr>       <chr>       <chr>     <chr>     <chr> 
#> 1 Harry Potter 1        Fantasy     Young Adult Magic     <NA>      <NA>  
#> 2 The Hunger Games 1    Young Adult Fiction     Science … Dystopia  <NA>  
#> 3 To Kill A Mockingbird Classics    Fiction     Historic… Historic… Acade…

Here a solution using data.table and base R.

library(data.table)
setDT(books)

books = unique(books[, strsplit(genre, ", "), by = title])
books[, genre:= paste0("genre_", seq_along(V1)), by = title]
dcast(books, title ~ genre, value.var = "V1")
#                    title     genre_1     genre_2         genre_3            genre_4  genre_5
# 1:        Harry Potter 1     Fantasy Young Adult           Magic               <NA>     <NA>
# 2:    The Hunger Games 1 Young Adult     Fiction Science Fiction           Dystopia     <NA>
# 3: To Kill A Mockingbird    Classics     Fiction      Historical Historical Fiction Academic

You can use separate with a step to remove non-unique genres before using separate.

library(tidyverse)
library(magrittr)

#remove non-unique genres
books %<>% mutate(genre = map(str_split(genre, ', '), ~ paste(unique(.x), collapse = ','))) 

#separate into columns
books %>% 
  separate(col = 2, into = paste0('genre', seq(max(str_count(books$genre, ',')) + 1L))
           , sep = ',')


# # A tibble: 3 x 6
#   title                 genre1      genre2      genre3          genre4             genre5  
#   <chr>                 <chr>       <chr>       <chr>           <chr>              <chr>   
# 1 Harry Potter 1        Fantasy     Young Adult Magic           NA                 NA      
# 2 To Kill A Mockingbird Classics    Fiction     Historical      Historical Fiction Academic
# 3 The Hunger Games 1    Young Adult Fiction     Science Fiction Dystopia           NA