Is your Outlier Correction outstanding?
Introduction
Outlier correction is commonly perceived as a necessary but tiresome, recurring task for a demand planner. The advantages of performing the outlier correction as a pre-processing step in the Forecasting process are nevertheless evident:
- You would get more clarification on which portion of your demand history is due to special marketing activities (promotions, price changes, etc.) of your company or your competitors.
- You won’t need to define your forecasting algorithms that robust anymore to cope with unusual demand peaks and depths.
- The forecasting algorithms’ output will become more plausible and lowers the planner’s effort of cross-checking the forecasting figures.
So, what is an outlier at all? Outliers are historical data points beyond a predefined tolerance range. A tolerance range can be defined in various ways, such as +/- 20% on the arithmetic mean or 2 times the standard deviation.
SAP IBP offers as an integral part of the Demand module the Outlier Correction as a pre-processing step for statistical forecasting.
Basically, there are two features available to finetune the algorithm:
- Outlier Detection Method: here you choose which method should be applied to detect the outlier
- Outlier Correction Method: here you decide what should happen with the outlier, if the value shall be changed at all, and if yes to which extent.
The following Outlier Detection Methods are available:
# | Method | Description |
1 | Interquartile Test | The data is split into quartiles and then calculates the upper and lower bounds with a multiplier (details see below). |
2 | Variance Test | The tolerance range is defined by the arithmetic mean +/- the standard deviation times a multiplier (usual range of multiplier is 1.5 to 3). |
The following outlier correction methods are available in SAP IBP:
# | Method | Description |
1 | Correction with Mean Excluding Outliers | The system replaces outliers with the mean of all key figure values calculated for the historical periods, and doesn’t consider the outliers for the calculation. This is the default outlier correction method. |
2 | Correction with Mean | The system replaces outliers with the average of all key figure values calculated for the historical periods. |
3 | Correction with Median Excluding Outliers | The system replaces outliers with the median of all key figure values calculated for the historical periods, and doesn’t consider the outliers for the calculation. |
4 | Correction with Median | The system replaces outliers with the median of all key figure values calculated for the historical periods. The outliers are excluded from the calculation. |
5 | Correction with Tolerance Excluding Outliers | The system recalculates the tolerance interval without taking the outliers into consideration, and changes the outliers so that they are at the limits of the new tolerance interval. |
6 | Correction with Tolerance | The system recalculates the tolerance interval and changes the outliers so that they are at the limits of the new tolerance interval. |
7 | No Correction | Outliers are not corrected; outlier detection is used for information purposes only. |
Outlier Detection Methods
As stated above, there are two outlier detection methods available: Interquartile Test (referred to as Interquartile Range or Midspread as well) and Variance Test.
The Interquartile Test (IQT) is measuring the statistical distribution as the difference between the 75^{th} and the 25^{th} percentiles. What does this mean? First, the whole data series is split into four equal groups, each containing 25% of the data observations. As next step, the quartiles are determined as such:
- The first quartile (Q1) splits the lowest 25% of data from the highest 75% (therefore, it’s also called the 25th percentile).
- The second quartile (Q2) cuts the data into 2 equal halves (therefore, it’s also called the median or the 50th percentile).
- The third quartile (Q3) splits the highest 25% of data from the lowest 75% (therefore, it’s also called the 75th percentile).
- The interquartile range (IQR) is the difference between the third and first quartiles. (IQR = Q3 – Q1)
To detect the outliers using the interquartile range method, the system calculates a lower and an upper bound using the first (Q1) and the third (Q3) quartile:
- Lower bound = Q1 – Multiplier × IQR
- Upper bound = Q3 + Multiplier × IQR
In below graph (Source: Wikipedia) you can see the graphical representation of the Interquartile Range (IQR) using a multiplier of 1.5:
The Variance Test is constructed a little bit simpler. First, it calculates the mean value of the selected data series, then it adds or subtracts the standard deviation times a multiplier:
- Lower bound = Mean – Multiplier × standard deviation
- Upper bound = Mean + Multiplier × standard deviation
Any value beyond these bounds are then considered as outliers. Therefore, choosing the right multiplier value is important (usually between 1.5 and 3, see as well the graph for reference).
SAP recommends to take advantage of the Interquartile Test as the method of choice. Compared to the IRT the Variance Test looks sort of inflexible: the mean value as basis for calculation of the upper and lower bounds may not be sufficient in case of highly volatile data series. We will come back to this later.
Outlier Correction Methods
The correction methods work either with the (1) mean or the (2) median value or are based on the (3) tolerance interval. Each of the three can do the calculation considering the outliers or ignoring them.
Let us run some tests with a default time series; input key figure: “Actuals Qty” (Blue line in graph), output key figure: “Actuals Qty Adj.” (Green line in graph). First we execute the Methods with the preceding outlier detection method Interquartile Test (IQT):
Method “Correction with Mean Excluding Outliers”
Arithmetic mean is roughly 2.000, so the algorithm corrected the outlier in week 12 / 2018 down to the mean value.
Method “Correction with Mean”
Correction to arithmetic mean including the outliers results only in slightly different value compared to previous method. The adjusted value in week 12 / 2018 is higher for the simple reason that extreme outliers are included into the mean calculation.
Method “Correction with Tolerance Excluding Outliers”
Now, the newly calculated value for the outlier correction in week 12 / 2018 deviates significantly from previous runs.
The calculated standard deviation is 2.180, mean value 2.022, assumed multiplier is round about 2.Method “Correction with Tolerance”
Method “Correction with Tolerance”
Execution of method “Correction with Tolerance” (including the outliers) leads to a much more relaxed tolerance interval. For this reason it has ignored our previous outlier example, while it continues to “cut” high peaks which exceeds the upper bound.
Method “Correction with Median Excluding Outliers”
Here the value has been adjusted to the mean value, not considering the outliers.
Method “Correction with Median”
Here the calculated median, to which the outlier has been adjusted is higher as outliers have been included into the median calculation.
Running the alternative outlier detection method “Variance Test” upfront, leads to very different results in the outlier correction, in some case more reasonable ones.
For example, the method Correction with Tolerance will result in following:
The upper bound has been set to 5.227, to which then the outlier has been reduced to.
Running the method Correction with Mean results in “volcano” style time series and cuts the outlier peaks down to the mean:
Summary
- The Outlier Detection Method and the Outlier Correction Method are interacting in an efficient way and result in most cases in reasonable outputs.
- Both methods are comprehensible and behave in the expected way.
- Dependent on the nature of your sales history you may choose the IQT or the Variance Method
- Correction to mean makes probably only sense in those cases where the volatility of the data is low, or the median is close to the mean.
Alternative Outlier Correction
The disadvantage of using the arithmetic mean is that it is calculated as a constant value and so are the upper and lower boundaries. In case of seasonal patterns in the sales history this may lead to unwanted effects such as ignoring repetitive demand peaks (see below Excel example):
Alternatively, there is the possibility in SAP IBP to apply a more dynamic time series as basis for the boundary calculation such as Ex-post Forecast. In the next step, you calculate the standard deviation directly in a key figure (requires some configuration effort to achieve this) or via a local member in your planning view. Finally you configure the key figure calculations for the upper and lower boundaries and the automated correction of the outliers.
The result is that outliers are corrected in a way to respect trend and season in the historic data patterns (see below Excel example):
Hi,
Good article on corrections and just wants to know SAP HANA prdective analysis library is used for both demand planning ( pre-processing & Postprocessing) & Demand sensing (Post Processing).
Thanks & Regards
Sree
Hello Srinivas,
for Demand Planning for sure, for Demand Sensing I will need to explore that a little bit deeper first.
Thanks & best regards
Christian
Hi would you mind letting me know which hosting company you’re working with? I’ve loaded your blog in 3 completely different browsers and I must say this blog loads a lot faster then most. Can you suggest a good internet hosting provider at a reasonable price? Kudos, I appreciate it!
Strato
Like!! I blog frequently and I really thank you for your content. The article has truly peaked my interest.