Aggregating hourly data into daily aggregates
1) This can be done compactly using zoo:
L <- "Date,DBT
01/01/2000 01:00,30
01/01/2000 02:00,31
01/01/2000 03:00,33
12/31/2000 23:00,25"
library(zoo)
stat <- function(x) c(min = min(x), max = max(x), mean = mean(x))
z <- read.zoo(text = L, header = TRUE, sep = ",", format = "%m/%d/%Y", aggregate = stat)
This gives:
> z
min max mean
2000-01-01 30 33 31.33333
2000-12-31 25 25 25.00000
2) here is a solution that only uses core R:
DF <- read.csv(text = L)
DF$Date <- as.Date(DF$Date, "%m/%d/%Y")
ag <- aggregate(DBT ~ Date, DF, stat) # same stat as in zoo solution
The last line gives:
> ag
Date DBT.min DBT.max DBT.mean
1 2000-01-01 30.00000 33.00000 31.33333
2 2000-12-31 25.00000 25.00000 25.00000
EDIT: (1) Since this first appeared the text=
argument to read.zoo
was added in the zoo package.
(2) minor improvements.
Using strptime()
, trunc()
and ddply()
from the plyr package :
#Make the data
ZZ <- textConnection("Date,DBT
01/01/2000 01:00,30
01/01/2000 02:00,31
01/01/2000 03:00,33
12/31/2000 23:00,25")
dataframe <- read.csv(ZZ,header=T)
close(ZZ)
# Do the calculations
dataframe$Date <- strptime(dataframe$Date,format="%m/%d/%Y %H:%M")
dataframe$day <- trunc(dataframe$Date,"day")
require(plyr)
ddply(dataframe,.(day),
summarize,
aveDBT=mean(DBT),
maxDBT=max(DBT),
minDBT=min(DBT)
)
gives
day aveDBT maxDBT minDBT
1 2000-01-01 31.33333 33 30
2 2000-12-31 25.00000 25 25
To clarify :
strptime
converts the character to dates according to the format. To see how you can specify the format, see ?strptime
. trunc
will then truncate these date-times to the specified unit, which is day in this case.
ddply
will evaluate the function summarize
within the dataframe after splitting it up according to day
. everything after summarize
are arguments that are passed to the function summarize
.
There is also a nice package called hydroTSM
. It uses zoo
objects and can convert to other aggregates in time
The function in your case is subdaily2daily
. You can choose if the aggregation should be based on min / max / mean...
A couple of options:
1. Timetk
If you have a data frame (or tibble) then the summarize_by_time()
function from timetk
can be used:
library(tidyverse)
library(timetk)
# Collect Data
text <- "Date,DBT
01/01/2000 01:00,30
01/01/2000 02:00,31
01/01/2000 03:00,33
12/31/2000 23:00,25"
df <- read_csv(text, col_types = cols(Date = col_datetime("%m/%d/%Y %H:%M")))
df
#> # A tibble: 4 x 2
#> Date DBT
#> <dttm> <dbl>
#> 1 2000-01-01 01:00:00 30
#> 2 2000-01-01 02:00:00 31
#> 3 2000-01-01 03:00:00 33
#> 4 2000-12-31 23:00:00 25
# Summarize
df %>%
summarise_by_time(
.date_var = Date,
.by = "day",
min = min(DBT),
max = max(DBT),
mean = mean(DBT)
)
#> # A tibble: 2 x 4
#> Date min max mean
#> <dttm> <dbl> <dbl> <dbl>
#> 1 2000-01-01 00:00:00 30 33 31.3
#> 2 2000-12-31 00:00:00 25 25 25
Created on 2021-05-21 by the reprex package (v2.0.0)
2. Tidyquant
You can use the tidyquant
package for this. The process is involves using the tq_transmute
function to return a data frame that is modified using the xts aggregation function, apply.daily
. We'll apply a custom stat_fun
, which returns the min, max and mean. However, you can apply any vector function you'd like such as quantile
.
library(tidyquant)
df
#> # A tibble: 4 x 2
#> Date DBT
#> <dttm> <dbl>
#> 1 2000-01-01 01:00:00 30
#> 2 2000-01-01 02:00:00 31
#> 3 2000-01-01 03:00:00 33
#> 4 2000-12-31 23:00:00 25
stat_fun <- function(x) c(min = min(x), max = max(x), mean = mean(x))
df %>%
tq_transmute(select = DBT,
mutate_fun = apply.daily,
FUN = stat_fun)
# A tibble: 2 x 4
#> Date min max mean
#> <dttm> <dbl> <dbl> <dbl>
#> 1 2000-01-01 03:00:00 30 33 31.33333
#> 2 2000-12-31 23:00:00 25 25 25.00000