##### SAP IBP Alerts with Machine Learning

**Introduction**

Planning processes in SAP IBP using Machine Learning (ML) algorithms such as are included in the SAP HANA Predictive Analysis Library (PAL) do exist up to current IBP release yet to a manageable extent. The obvious spots in IBP where Machine learning plays a role are

- Clustering
of
*Alerts*with ML clustering algorithms - Forecasting
Model
*Gradient Boosting of Decision Trees*

Before we go into the details of above mentioned functionalities we like dive a little bit deeper into the general offerings of ML and what it all turns out to be for IBP.

**What is
Machine Learning (ML)?**

A good definition of Machine Learning can be found at Wikipedia:

*“Machine learning (ML) is the scientific study
of algorithms and statistical models that computer systems use to perform a
specific task without using explicit instructions, relying on patterns and
inference instead. It is seen as a subset of artificial intelligence. Machine
learning algorithms build a mathematical model based on sample data, known as
“training data”, in order to make predictions or decisions without
being explicitly programmed to perform the task.”*

Consider a plot of data points, e.g. temperatures in spring over a decade for a specific location. The algorithm is supposed to generate a prediction for the months in spring next year based on historical values including the outliers, unexpected highs and lows. A standard approach would be to use linear regression to find a “data fitting” line with the least squared residuals (deviations of the observed value to the data point predicted by the model). See below an example with a data fitting line generated with the linear regression algorithm:

The task for ML would be here to identify patterns or data clusters to deduct – in case of time series – predictions for the future without prework of a planner to adjust the algorithm parameters or spending a lot of time to adjust the history outliers.

**ML for
Management-by-Exception**

Management-by-Exception is considered in Supply & Operations Planning as an important part of the overall planning process. Alerts are supposed to indicate specific frictions along the supply chain such as material shortages, due date violations or KPI values beyond the tolerance interval; they shall support the supply chain planner to shed some light on points of attention. Tools like the Alert Monitor in SAP IBP and formerly SAP APO intend to channel all the issues in an efficient way.

Unfortunately the high number of alerts often prevent the planner from seeing the wood for the trees. This may be due to missing filters or numerous alert types, but in some cases it may be hard to define the appropriate filters and do not miss the important alerts because there are just too much alerts out there.

SAP IBP introduces here machine learning algorithms to support an automation of filtering and clustering of the supply chain alerts. Two ML algorithms are available in current IBP release, which are in general classified as clustering algorithms:

- K-Means
- DBSCAN

The K-Means algorithm partitions n observations or data records into k clusters in which each observation belongs to the cluster with the nearest centre or mean. The “centroids” of the clusters are being re calculated for convergence until none of the cluster assignments change.

Please find here a good animation on how K-Means will define the clusters with an iterative refinement of the data clusters (Source: https://en.wikipedia.org/wiki/K-means_clustering):

By Chire – Own work, CC BY-SA 4.0, Link

Usage of the K-Means algorithm requires to name the key figure to work on, but there is no maintenance of additional parametrization needed:

Example case in IBP:

We define a custom alert definition to indicate capacity utilization beyond 100%:

First, we review the alerts without applying the ML algorithm K-Means and receive 98 capacity overload alerts:

With the ML algorithm K-Means number of alerts will immediately decrease by clustering them to nine “alert bubbles”:

The DBSCAN
algorithm requires some parameter maintenance when selected. DBSCAN (Density
Based Spatial Clustering of Applications with Noise) finds a number of clusters
*starting from the estimated density distribution* of corresponding nodes,
which means that the assignment of points to a cluster is based on their
distance. Outliers are considered as “noise”.

DBSCAN
requires a **key figure** on which the algorithm will be applied. Optionally
you can enter the **attributes** on which the key figure is clustered. These
attributes are the ones that come from the calculation level of the alert. The
attributes are optionally visible to the user, but can be hidden as well. If
you choose the **manual** option, the DBSCAN algorithm parameters are exposed
and the user can overwrite the default values such as *Attribute Clustering
Method*, *Scan Radius*, *Distance method*.

Generally speaking, you are good to choose the automated option letting the algorithm decide on the right parameters. Maintaining the parameters manual requires you to dive in into the maths of the algorithm. There are quite a lot of good articles and handsome explanations for the right parameters’ choice available in the internet, but if you are not a mathematical genius (I am definitely not) this may give you a hard time anyway. I listed the parameters nevertheless below.

SAP provides various parameters for the DBSCAN algorithm including some documentation:

*Hard
clustering*

When this option is selected, the data is **pre
clustered** based on the attributes field and the algorithm is called several
times for each cluster. The result is the aggregation of the results of all
calls. This is the default behaviour when Automatic option is selected. Since
this option is required to call a distinct DBSCAN algorithm for each cluster
defined in the attribute field, this mean it could be a lot slower to execute
compare to the soft clustering.

*Soft
clustering*

The DBSCAN algorithm determines **automatically**
the clustering of the data based on the attributes provided in the Attributes
field. When you select this option, the weighting of the clustering is determined
by the field Category Weight. With soft clustering, all the data is sent to the
DBSCAN algorithm in a single batch and the clustering is performed while the
algorithm is running, this results in a lot faster execution than the hard
clustering.

DBSCAN
requires two parameters: **scan radius (eps)** and the **minimum number of
points (minPts )** required to form a cluster. The algorithm starts with an
arbitrary starting point that has not been visited. This point’s eps
neighbourhood is retrieved, and if the number of points it contains is equal to
or greater than minPts , a cluster is started. Otherwise, the point is labelled
as noise.

*Minimum
points (minPts)*

If the data provided to the DBSCAN is 4,4,4,10. If the minimum number of points for a cluster is set to 5, the outliers will be 4,4,4,10. If the minimum number of points for a cluster is set to 2, the outlier will be 10 because there are 3 values with value 4 so this is considered as a cluster.

*Scan
radius (eps)*

This is the distance between each point to consider they are part of the same cluster. For instance, if we have the value 3 and 5. If the Scan Radius is set to 1, both values 3 and 5 will be determined as outliers as the distance between the values is 2. If the Scan Radius is set to 3, the values 3 and 5 will be considered close enough to be part of the same cluster and no outliers will be detected.

More **optional**
parameters are:

*Thread
number*

Running multiple threads at the same time can improve performance but add load to the server during the execution

*Category
weight*

This option is used when soft clustering is selected. This determines the importance of the clustering determined in the field attributes. The DBSCAN clustering is performed at all level and outliers is determined and aggregated at each level.

*Distance
method*

Specifies the method to compute the distance between two points Options are: Manhattan, Euclidean, Minkowski , Chebyskv , Standardized Euclidean, and Cosine

*Minkowski
Power*

Used with distance method Minkowski

If you go for the DBSCAN with automated parametrization, the effect will be as well a strong reduction and clustering of alerts: first, we test with Soft Clustering:

Now, we do the same with Hard Clustering, which seems to be the recommended way when choosing Automated DBSCAN according to the documentation:

**Summary
on ML for Alert Clustering**

In general, applying both ML algorithms, K-Means and DBSCAN, for alerts in SAP IBP are useful to compress alerts on a large scale to a manageable number, which in turn means a good support for the supply chain planner to focus on the problem spots. The algorithms are easy to handle in the alert definitions; any significant deterioration of system performance has not been observed.

Nevertheless, the quality of the alert cluster contents need further investigation. Any comments or contributions on this subject are highly welcome!