Postgresql SQL GROUP BY time interval with arbitrary accuracy (down to milli seconds)

Solution 1:

You can generate a table of "buckets" by adding intervals created by generate_series(). This SQL statement will generate a table of five-minute buckets for the first day (the value of min(measured_at)) in your data.

select 
  (select min(measured_at)::date from measurements) + ( n    || ' minutes')::interval start_time,
  (select min(measured_at)::date from measurements) + ((n+5) || ' minutes')::interval end_time
from generate_series(0, (24*60), 5) n

Wrap that statement in a common table expression, and you can join and group on it as if it were a base table.

with five_min_intervals as (
  select 
    (select min(measured_at)::date from measurements) + ( n    || ' minutes')::interval start_time,
    (select min(measured_at)::date from measurements) + ((n+5) || ' minutes')::interval end_time
  from generate_series(0, (24*60), 5) n
)
select f.start_time, f.end_time, avg(m.val) avg_val 
from measurements m
right join five_min_intervals f 
        on m.measured_at >= f.start_time and m.measured_at < f.end_time
group by f.start_time, f.end_time
order by f.start_time

Grouping by an arbitrary number of seconds is similar--use date_trunc().

A more general use of generate_series() lets you avoid guessing the upper limit for five-minute buckets. In practice, you'd probably build this as a view or a function. You might get better performance from a base table.

select 
  (select min(measured_at)::date from measurements) + ( n    || ' minutes')::interval start_time,
  (select min(measured_at)::date from measurements) + ((n+5) || ' minutes')::interval end_time
from generate_series(0, ((select max(measured_at)::date - min(measured_at)::date from measurements) + 1)*24*60, 5) n;

Solution 2:

Catcall has a great answer. My example of using it demonstrates having fixed buckets - in this case 30 minute intervals starting at midnight. It also shows that there can be one extra bucket generated in Catcall's first version and how to eliminate it. I wanted exactly 48 buckets in a day. In my problem, observations have separate date and time columns and I want to average the observations within a 30 minute period across the month for a number of different services.

with intervals as (
    select
        (n||' minutes')::interval as start_time, 
        ((n+30)|| ' minutes')::interval as end_time
    from generate_series(0, (23*60+30), 30) n
)
select i.start_time, o.service, avg(o.o)
from
observations o right join intervals i
on o.time >= i.start_time and o.time < i.end_time
where o.date between '2013-01-01' and '2013-01-31'
group by i.start_time, i.end_time, o.service
order by i.start_time

Solution 3:

How about

SELECT MIN(val), 
EXTRACT(epoch FROM measured_at) / EXTRACT(epoch FROM INTERVAL '5 min') AS int 
FROM measurements 
GROUP BY int

where '5 min' can be any expression supported by INTERVAL

Postgresql SQL GROUP BY time interval with arbitrary accuracy (down to milli seconds)

Solution 1:

Solution 2:

Solution 3:

Related

Recent Posts