# Boxcar Averaging

## Recommended Posts

I have received the following request from the departmental statistician of the faculty where I am doing my PhD and would hope that you could provide me the information he has requested.

The data I'm collecting is sampled at i minute intervals, and average is checked for 900 samples (15 minutes).

Thanks for the info re: averaging via a boxcar function. This option is better than the usual running average but there still is a possible catch when dealing with auto-correlated observations.

As per attached figure below, there is a sequence of observations (one per time step), the blue rectangle represents the `support' of the box car function for the first 15 minute window. All observations within this window from the 1st averaged data point. Those observations within the support of the second box car function (red rectangle) form the 2nd averaged data point.

Now assume a lag 1 autocorrelation (for the sake of example) so that we can expect the previous observation to provide information on the value of the current observation. Observe in the figure at the observation at the end of the 1st boxcar and the first observation of the 2nd boxcar are coloured green. This is to indicate that they share significant information in common yet they are used in different boxcar functions to produce averaged data values.

In this case there is still `information leakage'.

What exactly this phenomenon will do to the averages is difficult to say - it really depends on the underlying `structure' of the signal. If the signal is highly irregular, chaotic, with large fluctuations in magnitude and the amount of `correlation contaminated' data  was relatively high compared to the width of the box function then the average estimator could exhibit `high variance' and actually be quite unstable itself. The end result will be troubles (likely greater uncertainty) when trying to estimate the response surfaces. If possible such situations should be avoided.

So I need you to check to see if the averaging procedure via the box car function is as hypothesised for consecutive averages. It might not be but please check.

If it is then in the ideal world we would obtain some trial data across the range of `representative' conditions and then estimate the time lag for which observations are `significantly' correlated.

We could then clip the `X'  (for auto-correlation lag of X) numbers of observations from the start/end of each boxcar function when obtaining the average values.

Let's find out first if the averaging is in fact performed by taking consecutive boxcar windows (as displayed) and then a strategy can be formulated from there. Maybe we have got enough data already via your website?

Being concerned about averages is highly technical but it will be important if we are to estimate these response surfaces correctly.

Kind Regards

Helmut

box-car-average.pdf