Replacing Outliers

 

If you want to identify and replace the outliers with a specific value, you can use binary series for a quick and efficient result. (W1 > 10.0) yields a series which is 1.0 wherever W1 is greater than 10.0, and 0.0 elsewhere. Multiply this by the desired replacement value:

 

 (W1 > 10.0) * -1.5

 

to get a series which is -1.5 at each replacement location, and 0.0 elsewhere. This is "close", but is only half of the solution. The other part of the problem is to retain those points in W1 that are not outliers (i.e. W1 <= 10.0).

 

 (not(W1 > 10.0) * W1)

 

returns a series that is the same as W1 everywhere W1 does not exceed 10.0, and 0.0 elsewhere. If you add the two expressions together, you get the desired result:

 

 ((W1 > 10.0) * (-1.5)) + (not(W1 > 10.0) * W1))

 

To generalize with a macro:

 

#define replace(s, cond, val) (((cond)*(val)) + ((not(cond))*(s)))

 

where s is the input series, cond is the condition for replacement, and val is the desired replacement value.

 

What happens if you use NAVALUE for the replacement value? Try:

 

((W1 > 10.0) * (navalue)) + (not(W1 > 10.0) * W1))

 

Those points which are greater than 10.0 are "dropped out" of the display of the series, but the x-axis is retained because the NAVALUE serves as a placeholder.

 

Finally, if you want to replace those outliers with a linear interpolation of the surrounding points, consider the following:

 

 delete(W1, W1 > 10.0)

 

returns a series of the y-values which are not outliers, i.e. W1 <= 10.0.

 

 delete(xvals(w1), w1 > 10.0)

 

returns a series of the x-values where the y-values of W1 are not outliers.

 

 xy(delete(xvals(w1), w1 > 10.0), delete(w1, w1 > 10.0))

 

creates an XY plot where the outliers are removed from the x- and y-values. DADiSP will graphically "connect the dots" in the XY plot, so that it appears to have interpolated between the points on either side of the outliers, but, no extra points have been added. To insert the interpolated points, use:

 

 xyinterp(delete(xvals(w1), w1 > 10.0),delete(w1, w1 > 10.0))

 

Finally, the OUTLIER function incorporates these ideas into one, simple outlier removal function.