Calculate rolling / moving average in C++

If your needs are simple, you might just try using an exponential moving average.

http://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average

Put simply, you make an accumulator variable, and as your code looks at each sample, the code updates the accumulator with the new value. You pick a constant "alpha" that is between 0 and 1, and compute this:

accumulator = (alpha * new_value) + (1.0 - alpha) * accumulator

You just need to find a value of "alpha" where the effect of a given sample only lasts for about 1000 samples.

Hmm, I'm not actually sure this is suitable for you, now that I've put it here. The problem is that 1000 is a pretty long window for an exponential moving average; I'm not sure there is an alpha that would spread the average over the last 1000 numbers, without underflow in the floating point calculation. But if you wanted a smaller average, like 30 numbers or so, this is a very easy and fast way to do it.


You simply need a circular array (circular buffer) of 1000 elements, where you add the element to the previous element and store it.

It becomes an increasing sum, where you can always get the sum between any two pairs of elements, and divide by the number of elements between them, to yield the average.


Basically I want to track the moving average of an ongoing stream of a stream of floating point numbers using the most recent 1000 numbers as a data sample.

Note that the below updates the total_ as elements as added/replaced, avoiding costly O(N) traversal to calculate the sum - needed for the average - on demand.

template <typename T, typename Total, size_t N>
class Moving_Average
{
  public:
    Moving_Average& operator()(T sample)
    {
        total_ += sample;
        if (num_samples_ < N)
            samples_[num_samples_++] = sample;
        else
        {
            T& oldest = samples_[num_samples_++ % N];
            total_ -= oldest;
            oldest = sample;
        }
        return *this;
    }

    operator double() const { return total_ / std::min(num_samples_, N); }

  private:
    T samples_[N];
    size_t num_samples_{0};
    Total total_{0};
};

Examples:

// average of last 3 (from 4) samples...
std::cout << Moving_Average<double, double, 3>{}(4)(7)(2)(6) << '\n';
    // "5\n"

// average of last 3 squares...
Moving_Average<double, double, 3> ma;
for (int i = 0; i < 10; ++i)
    std::cout << (i * i) << ':' << ma(i * i) << ' ';
std::cout << '\n';
    // 0:0 1:0.5 4:1.66667 9:4.66667 16:9.66667 25:16.6667 36:25.6667 49:36.6667 64:49.6667 81:64.6667

Total is made a different parameter from T to support e.g. using a long long when totalling 1000 longs, an int for chars, or a double to total floats.

Issues

This is a bit flawed in that num_samples_ could conceptually wrap back to 0, but it's hard to imagine anyone having 2^64 samples: if concerned, use an extra bool data member to record when the container is first filled while cycling num_samples_ around the array (best then renamed something innocuous like "pos").

Another issue is inherent with floating point precision, and can be illustrated with a simple scenario for T=double, N=2: we start with total_ = 0, then inject samples {1E17, 1, 2}...

  • 1E17, we execute total_ += 1E17, so total_ == 1E17, then inject

  • 1, we execute total += 1, but total_ == 1E17 still, as the "1" is too insignificant to change the 64-bit double representation of a number as large as 1E17, then we inject

  • 2, we execute total += 2 - 1E17, in which 2 - 1E17 is evaluated first and yields -1E17 as the 2 is lost to imprecision/insignificance, so to our total of 1E17 we add -1E17 and total_ becomes 0, despite current samples of 1 and 2 for which we'd want total_ to be 3. Our moving average will calculate 0 instead of 1.5. As we add another sample, we'll subtract the "oldest" 1 from total_ despite it never having been properly incorporated therein; our total_ and moving averages are likely to remain wrong.

You could add code that stores the highest recent total_ and if the current total_ is too small a fraction of that (a template parameter could provide a multiplicative threshold), you recalculate the total_ from all the samples in the samples_ array (and set highest_recent_total_ to the new total_), but I'll leave that to the reader who cares sufficiently.