pivot_wider issue "Values in `values_from` are not uniquely identified; output will contain list-cols"

My data looks like this:

# A tibble: 6 x 4
  name          val time          x1
  <chr>       <dbl> <date>     <dbl>
1 C Farolillo     7 2016-04-20  51.5
2 C Farolillo     3 2016-04-21  56.3
3 C Farolillo     7 2016-04-22  56.3
4 C Farolillo    13 2016-04-23  57.9
5 C Farolillo     7 2016-04-24  58.7
6 C Farolillo     9 2016-04-25  59.0

I am trying to use the pivot_wider function to expand out the data based on the name column. I use the following code:

yy <- d %>% 
  pivot_wider(., names_from = name, values_from = val)

Which gives me the following warning message:

Warning message:
Values in `val` are not uniquely identified; output will contain list-cols.
* Use `values_fn = list(val = list)` to suppress this warning.
* Use `values_fn = list(val = length)` to identify where the duplicates arise
* Use `values_fn = list(val = summary_fun)` to summarise duplicates

The output looks like:

       time       x1        out1    out2 
    2016-04-20  51.50000    <dbl>   <dbl>
2   2016-04-21  56.34615    <dbl>   <dbl>
3   2016-04-22  56.30000    <dbl>   <dbl>
4   2016-04-23  57.85714    <dbl>   <dbl>
5   2016-04-24  58.70968    <dbl>   <dbl>
6   2016-04-25  58.96774    <dbl>   <dbl>

I know that here mentions the issue and to resolve it they suggest using summary statistics. However I have time series data and thus do not want to use summary statistics since each day has a single value (and not multiple values).

I know the problem is because the val column has duplicates (i.e. in the above example 7 occurs 3 times.

Any suggestions on how to pivot_wider and overcome this issue?

Data:

    d <- structure(list(name = c("C Farolillo", "C Farolillo", "C Farolillo", 
"C Farolillo", "C Farolillo", "C Farolillo", "C Farolillo", "C Farolillo", 
"C Farolillo", "C Farolillo", "C Farolillo", "C Farolillo", "C Farolillo", 
"C Farolillo", "C Farolillo", "C Farolillo", "C Farolillo", "C Farolillo", 
"C Farolillo", "C Farolillo", "C Farolillo", "C Farolillo", "C Farolillo", 
"C Farolillo", "C Farolillo", "C Farolillo", "C Farolillo", "C Farolillo", 
"C Farolillo", "C Farolillo", "C Farolillo", "C Farolillo", "C Farolillo", 
"C Farolillo", "C Farolillo", "C Farolillo", "C Farolillo", "C Farolillo", 
"C Farolillo", "C Farolillo", "C Farolillo", "C Farolillo", "C Farolillo", 
"C Farolillo", "C Farolillo", "C Farolillo", "C Farolillo", "C Farolillo", 
"C Farolillo", "C Farolillo", "C Farolillo", "Plaza Eliptica", 
"Plaza Eliptica", "Plaza Eliptica", "Plaza Eliptica", "Plaza Eliptica", 
"Plaza Eliptica", "Plaza Eliptica", "Plaza Eliptica", "Plaza Eliptica", 
"Plaza Eliptica", "Plaza Eliptica", "Plaza Eliptica", "Plaza Eliptica", 
"Plaza Eliptica", "Plaza Eliptica", "Plaza Eliptica", "Plaza Eliptica", 
"Plaza Eliptica", "Plaza Eliptica", "Plaza Eliptica", "Plaza Eliptica", 
"Plaza Eliptica", "Plaza Eliptica", "Plaza Eliptica", "Plaza Eliptica", 
"Plaza Eliptica", "Plaza Eliptica", "Plaza Eliptica", "Plaza Eliptica", 
"Plaza Eliptica", "Plaza Eliptica", "Plaza Eliptica", "Plaza Eliptica", 
"Plaza Eliptica", "Plaza Eliptica", "Plaza Eliptica", "Plaza Eliptica", 
"Plaza Eliptica", "Plaza Eliptica", "Plaza Eliptica", "Plaza Eliptica", 
"Plaza Eliptica", "Plaza Eliptica", "Plaza Eliptica", "Plaza Eliptica", 
"Plaza Eliptica", "Plaza Eliptica", "Plaza Eliptica", "Plaza Eliptica", 
"Plaza Eliptica", "Plaza Eliptica"), val = c(7, 3, 7, 13, 7, 
9, 20, 19, 4, 5, 5, 2, 6, 6, 16, 13, 7, 6, 3, 3, 6, 10, 5, 3, 
5, 3, 4, 4, 10, 11, 4, 13, 8, 2, 8, 10, 3, 10, 14, 4, 2, 4, 6, 
6, 8, 8, 3, 3, 13, 10, 13, 32, 25, 31, 34, 26, 33, 35, 43, 22, 
22, 21, 10, 33, 33, 48, 47, 27, 23, 11, 13, 25, 31, 20, 16, 10, 
9, 23, 11, 23, 26, 16, 34, 17, 4, 24, 21, 10, 26, 32, 10, 5, 
9, 19, 14, 27, 27, 10, 8, 28, 32, 25), time = structure(c(16911, 
16912, 16913, 16914, 16915, 16916, 16917, 16918, 16919, 16920, 
16921, 16922, 16923, 16923, 16924, 16925, 16926, 16927, 16928, 
16929, 16930, 16931, 16932, 16933, 16934, 16935, 16936, 16937, 
16938, 16939, 16940, 16941, 16942, 16943, 16944, 16945, 16946, 
16947, 16948, 16949, 16950, 16951, 16952, 16953, 16954, 16955, 
16956, 16957, 16958, 16959, 16960, 16911, 16912, 16913, 16914, 
16915, 16916, 16917, 16918, 16919, 16920, 16921, 16922, 16923, 
16923, 16924, 16925, 16926, 16927, 16928, 16929, 16930, 16931, 
16932, 16933, 16934, 16935, 16936, 16937, 16938, 16939, 16940, 
16941, 16942, 16943, 16944, 16945, 16946, 16947, 16948, 16949, 
16950, 16951, 16952, 16953, 16954, 16955, 16956, 16957, 16958, 
16959, 16960), class = "Date"), x1 = c(51.5, 56.3461538461538, 
56.3, 57.8571428571429, 58.7096774193548, 58.9677419354839, 64.4615384615385, 
61.9310344827586, 60.3214285714286, 59.4137931034483, 59.5806451612903, 
57.3448275862069, 64.0333333333333, 64.0333333333333, 70.15625, 
71.3636363636364, 62.8125, 56.4375, 56.4516129032258, 51.741935483871, 
52.84375, 53.09375, 52.969696969697, 54, 54.3870967741936, 60.3870967741936, 
64.4516129032258, 66.2903225806452, 68.2333333333333, 69.7741935483871, 
70.5806451612903, 73.8275862068966, 72.8181818181818, 64.6764705882353, 
64.4838709677419, 68.7741935483871, 62.1764705882353, 68.969696969697, 
70.1935483870968, 59.6774193548387, 59.9677419354839, 63.125, 
67.5882352941177, 71.4705882352941, 73.8529411764706, 76.1935483870968, 
72.6451612903226, 76.0645161290323, 76.4193548387097, 81.7741935483871, 
85.0645161290323, 51.5, 56.3461538461538, 56.3, 57.8571428571429, 
58.7096774193548, 58.9677419354839, 64.4615384615385, 61.9310344827586, 
60.3214285714286, 59.4137931034483, 59.5806451612903, 57.3448275862069, 
64.0333333333333, 64.0333333333333, 70.15625, 71.3636363636364, 
62.8125, 56.4375, 56.4516129032258, 51.741935483871, 52.84375, 
53.09375, 52.969696969697, 54, 54.3870967741936, 60.3870967741936, 
64.4516129032258, 66.2903225806452, 68.2333333333333, 69.7741935483871, 
70.5806451612903, 73.8275862068966, 72.8181818181818, 64.6764705882353, 
64.4838709677419, 68.7741935483871, 62.1764705882353, 68.969696969697, 
70.1935483870968, 59.6774193548387, 59.9677419354839, 63.125, 
67.5882352941177, 71.4705882352941, 73.8529411764706, 76.1935483870968, 
72.6451612903226, 76.0645161290323, 76.4193548387097, 81.7741935483871, 
85.0645161290323)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-102L))

Solution 1:

Create a unique identifier row for each name and then use pivot_wider

library(dplyr)

d %>%
  group_by(name) %>%
  mutate(row = row_number()) %>%
  tidyr::pivot_wider(names_from = name, values_from = val) %>%
  select(-row)

# A tibble: 51 x 4
#   time          x1 `C Farolillo` `Plaza Eliptica`
#   <date>     <dbl>         <dbl>            <dbl>
# 1 2016-04-20  51.5             7               32
# 2 2016-04-21  56.3             3               25
# 3 2016-04-22  56.3             7               31
# 4 2016-04-23  57.9            13               34
# 5 2016-04-24  58.7             7               26
# 6 2016-04-25  59.0             9               33
# 7 2016-04-26  64.5            20               35
# 8 2016-04-27  61.9            19               43
# 9 2016-04-28  60.3             4               22
#10 2016-04-29  59.4             5               22
# … with 41 more rows

Solution 2:

Typically the error

Warning message:
Values in `val` are not uniquely identified; output will contain list-cols.

is most often caused by duplicate rows in the data (after excluding the val column), and not duplicates in the val column.

which(duplicated(d))
# [1] 14 65

OP's data seems to have two duplicate rows which is causing this issue. Removing the duplicate rows also gets rid of the error.

yy <- d %>% distinct() %>% pivot_wider(., names_from = name, values_from = val)
yy
# A tibble: 50 x 4
   time          x1 `C Farolillo` `Plaza Eliptica`
   <date>     <dbl>         <dbl>            <dbl>
 1 2016-04-20  51.5             7               32
 2 2016-04-21  56.3             3               25
 3 2016-04-22  56.3             7               31
 4 2016-04-23  57.9            13               34
 5 2016-04-24  58.7             7               26
 6 2016-04-25  59.0             9               33
 7 2016-04-26  64.5            20               35
 8 2016-04-27  61.9            19               43
 9 2016-04-28  60.3             4               22
10 2016-04-29  59.4             5               22
# ... with 40 more rows

Solution 3:

The problem is caused by the fact that the data that you want to spread / pivot wider has duplicate identifiers. While both suggestions above, i.e. creating a unique artifical id from row numbers with mutate(row = row_number()) , or filtering only distinct rows will allow you to pivot wider, but they change the structure of your table, which is likely to have an logical, organizational problem that will come out next time you try to join anything to it.

It is a much better practice to use the id_cols parameter explicity, to see that you actually want to have to be unique after pivotting wide, and if you are running into problems, re-organize the original table first. Of course, you may find reason for filtering to distinct rows, or adding a new ID, most likely you will want to avoid the duplication earlier in your code.