Optimizing filtering and thresholding parameter before integrating accelerometer data to obtain displacement

In my project, I am smoothing, thresholding and integrating accelerometer data twice to calculate displacement and comparing the result using a high precision camera. I need some help finding a numerical optimization algorithm that would give an optimal value for the length of smooth filter and the threshold value that minimizes the error between displacement measured by the sensor and the displacement captured by the high precision camera in all recorded sessions.

Before I get into the specifics, here is data from one recorded session. The x-axis is in seconds in both plots. The green line in top plot accelerometer says the sensor have moved 35 centimeters, and in the bottom plot the high precision camera says the sensor has moved 30 centimeters, which is acceptable difference for my use case.enter image description here

Currently, I am sampling my accelerometer at 70 samples a second. Since the accelerometer data is noisy a thresholding at $\pm$0.031 is applied before further processing. Accelerometer data if further cleaned up by applying a moving average filter of length 12, thresholding again at $\pm$0.03. Velocity is obtained by performing a thresholding again at $\pm$0.15, followed by moving average filter of length 12 before integrating using rectangular method. Finally, the displacement is obtained by integrating velocity data using rectangular method again.

Question: Currently I am tuning the following 4 parameters by hand,

  • 1st Moving average filter length(0 - 50; discrete)
  • 1st Thresholding value before first integration(0 - .5; continuous)
  • 2nd Thresholding value(0 - .5; continuous)
  • 2nd Moving average filter length before second integration (0 - 50; discrete)

But I want to find one set of values the minimizes error between displacement measured by accelerometer and camera on average.

Cost function: I am currently calculating cost for each session as the squared value of the difference between the area under the two displacement curve. Currently I have 12 recorded session. For one set of values for these 4 variables, the cost is calculated for each recording and summed.

I am new to numerical optimization don't know much about which method to use to find an optimal value for these 4 parameters given the cost function. Currently, I am reading SciPy documentation, but I am having hard time finding the appropriate method for my use case.

Any help/direction would be appreciated.


Solution 1:

I have some thoughts about your application, too voluminous for a Comment but not definitive enough to claim I can "answer" the Question. But let's give it try!

The steps you are taking with measured accelerometer data are intended to improve accuracy (agreement with a camera-based measurement), but they all have an effect of smoothing the data. My thoughts are in two directions, first that smoothing by (1) moving averages, (2) thresholding, and (3) integration might be redundant(!), and second that it might not contribute, in spite of possible redundancy, as much to the goal of accuracy as it might to some other goal (such as robustness).

The essential step is that of integration, which of course we do twice. If a test "harness" is written to allow various moving averages, thresholds, and integration parameters, then the base case to run (against all twelve test datasets) is integration alone. Double integration is going to smooth out noise that is small and unbiased. In any case it would be important to know if the noise in accelerometer measurements is too big or too biased to be handled by integration alone. The level of accuracy achieved by integration alone sets a benchmark against which moving average and threshold parameters can be judged.

However the benefits of those extra parameters are not necessarily limited to improving accuracy. The reliability of an algorithm can incorporate not only its accuracy but its robustness, meaning its ability to work in the presence of uncommon kinds of errors. The twelve test datasets you've marshalled may not be dissimilar enough to reveal actual experience with such errors, but we can think of some circumstances that might "fool" an algorithm. For example, suppose that the acceleration is typically rapid, but in some small number of cases we may have such slow acceleration that a set threshold removes important information. Whether such an outcome is harmful depends on the application's goals, but it illustrates the importance of considering the cost of "failure" in adjusting parameters.

In framing the problem as an optimization, you chose a least squares measure of error. Such a criterion is appropriate when errors are small and unbiased. However when errors are not homogeneous in this sense, least squares fitting of parameters is known to be unduly influence by any data points that exhibit larger errors, so-called outliers. Other measures of error should be considered if that kind of behavior is present, especially the 1-norm (sum of absolute values of errors). Measures of error can also be adapted for circumstances in which the consequences of under- and overestimating the outcome are not symmetric.