Insert rows for missing dates/times

This is an old question, but I just wanted to post a dplyr way of handling this, as I came across this post while searching for an answer to a similar problem. I find it more intuitive and easier on the eyes than the zoo approach.

library(dplyr)

ts <- seq.POSIXt(as.POSIXct("2001-09-01 0:00",'%m/%d/%y %H:%M'), as.POSIXct("2001-09-01 0:07",'%m/%d/%y %H:%M'), by="min")

ts <- seq.POSIXt(as.POSIXlt("2001-09-01 0:00"), as.POSIXlt("2001-09-01 0:07"), by="min")
ts <- format.POSIXct(ts,'%m/%d/%y %H:%M')

df <- data.frame(timestamp=ts)

data_with_missing_times <- full_join(df,original_data)

   timestamp     tr tt sr st
1 09/01/01 00:00 15 15 78 42
2 09/01/01 00:01 20 64 98 87
3 09/01/01 00:02 31 84 23 35
4 09/01/01 00:03 21 63 54 20
5 09/01/01 00:04 15 23 36 15
6 09/01/01 00:05 NA NA NA NA
7 09/01/01 00:06 NA NA NA NA
8 09/01/01 00:07 NA NA NA NA

Also using dplyr, this makes it easier to do something like change all those missing values to something else, which came in handy for me when plotting in ggplot.

data_with_missing_times %>% group_by(timestamp) %>% mutate_each(funs(ifelse(is.na(.),0,.)))

   timestamp     tr tt sr st
1 09/01/01 00:00 15 15 78 42
2 09/01/01 00:01 20 64 98 87
3 09/01/01 00:02 31 84 23 35
4 09/01/01 00:03 21 63 54 20
5 09/01/01 00:04 15 23 36 15
6 09/01/01 00:05  0  0  0  0
7 09/01/01 00:06  0  0  0  0
8 09/01/01 00:07  0  0  0  0

I think the easiest thing ist to set Date first as already described, convert to zoo, and then just set a merge:

df$timestamp<-as.POSIXct(df$timestamp,format="%m/%d/%y %H:%M")

df1.zoo<-zoo(df[,-1],df[,1]) #set date to Index

df2 <- merge(df1.zoo,zoo(,seq(start(df1.zoo),end(df1.zoo),by="min")), all=TRUE)

Start and end are given from your df1 (original data) and you are setting by - e.g min - as you need for your example. all=TRUE sets all missing values at the missing dates to NAs.

Date padding is implemented in the padr package in R. If you store your data frame, with your date-time variable stored as POSIXct or POSIXlt. All you need to do is:

library(padr)
pad(df_name)

See vignette("padr") or this blog post for its working.

I think this can accomplished by using complete in tidyr package.

library(tidyverse)
df <- df %>%
      complete(timestamp = seq.POSIXt(min(timestamp), max(timestamp), by = "minute"), 
               tr, tt, sr,st)

you can also initialize your start date and end date instead of using min(timestamp) and max(timestamp).

Insert rows for missing dates/times

Related

Recent Posts