Binning time series values [migrated]

It depends crucially on what you want to use your time series for. I'll look at this mainly from a forecasting point of view, because that is what I know most about.

  • If you are forecasting to replenish a supermarket with weekly deliveries, use sales summed into weekly bins. If you have multiple deliveries per week, daily bins may be useful. Or irregularly sized bins that correspond to the intervals between deliveries.
  • If you are looking at call centers, then you may need hourly bins for scheduling staff. And daily bins for capacity planning in terms of approving vacations. And quarterly or yearly bins for planning future capacity, whether to hire more agents or to enter into an outsourcing agreement.
  • If your data are timestamped sensor readings, then you will probably not want to sum them, but take averages within bins, or perhaps minima or maxima. Choosing a bin here would be a question of what frequency is most informative, and of what bin you can reasonably process, in the case of large amounts of data - and again, of what you plan on using your time series and/or forecasts for. E.g., if you want to monitor a process to assess when maintenance is necessary, anything from microsecond to weekly bins may be useful.

Note that different time bins may lead to multiple-seasonalities: weekly supermarket sales have intra-yearly seasonality, but daily sales have both intra-yearly and intra-weekly seasonality. And short bins may yield intermittent-time-series.

Note also that if you are interested in multiple granularities, there are ways of leveraging the temporal relationships involved (e.g., Kourentzes & Petropoulos, 2016).