Introduction

Planning processes in SAP IBP using Machine Learning (ML) algorithms such as are included in the SAP HANA Predictive Analysis Library (PAL) do exist up to current IBP release yet to a manageable extent. The obvious spots in IBP where Machine learning plays a role are

  • Clustering of Alerts with ML clustering algorithms
  • Forecasting Model Gradient Boosting of Decision Trees

Before we go into the details of above mentioned functionalities we like dive a little bit deeper into the general offerings of ML and what it all turns out to be for IBP.

What is Machine Learning (ML)?

A good definition of Machine Learning can be found at Wikipedia:

“Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions, relying on patterns and inference instead. It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model based on sample data, known as “training data”, in order to make predictions or decisions without being explicitly programmed to perform the task.”

Consider a plot of data points, e.g. temperatures in spring over a decade for a specific location. The algorithm is supposed to generate a prediction for the months in spring next year based on historical values including the outliers, unexpected highs and lows. A standard approach would be to use linear regression to find a “data fitting” line with the least squared residuals (deviations of the observed value to the data point predicted by the model). See below an example with a data fitting line generated with the linear regression algorithm:

The task for ML would be here to identify patterns or data clusters to deduct – in case of time series – predictions for the future without prework of a planner to adjust the algorithm parameters or spending a lot of time to adjust the history outliers.

ML for Management-by-Exception

Management-by-Exception is considered in Supply & Operations Planning as an important part of the overall planning process. Alerts are supposed to indicate specific frictions along the supply chain such as material shortages, due date violations or KPI values beyond the tolerance interval; they shall support the supply chain planner to shed some light on points of attention. Tools like the Alert Monitor in SAP IBP and formerly SAP APO intend to channel all the issues in an efficient way.

Unfortunately the high number of alerts often prevent the planner from seeing the wood for the trees. This may be due to missing filters or numerous alert types, but in some cases it may be hard to define the appropriate filters and do not miss the important alerts because there are just too much alerts out there.

SAP IBP introduces here machine learning algorithms to support an automation of filtering and clustering of the supply chain alerts. Two ML algorithms are available in current IBP release, which are in general classified as clustering algorithms:

  • K-Means
  • DBSCAN

The K-Means algorithm partitions n observations or data records into k clusters in which each observation belongs to the cluster with the nearest centre or mean. The “centroids” of the clusters are being re calculated for convergence until none of the cluster assignments change.

Please find here a good animation on how K-Means will define the clusters with an iterative refinement of the data clusters (Source: https://en.wikipedia.org/wiki/K-means_clustering):

K-means convergence.gif
By ChireOwn work, CC BY-SA 4.0, Link

Usage of the K-Means algorithm requires to name the key figure to work on, but there is no maintenance of additional parametrization needed:

Example case in IBP:

We define a custom alert definition to indicate capacity utilization beyond 100%:

First, we review the alerts without applying the ML algorithm K-Means and receive 98 capacity overload alerts:

With the ML algorithm K-Means number of alerts will immediately decrease by clustering them to nine “alert bubbles”:

The DBSCAN algorithm requires some parameter maintenance when selected. DBSCAN (Density Based Spatial Clustering of Applications with Noise) finds a number of clusters starting from the estimated density distribution of corresponding nodes, which means that the assignment of points to a cluster is based on their distance. Outliers are considered as “noise”.

DBSCAN requires a key figure on which the algorithm will be applied. Optionally you can enter the attributes on which the key figure is clustered. These attributes are the ones that come from the calculation level of the alert. The attributes are optionally visible to the user, but can be hidden as well. If you choose the manual option, the DBSCAN algorithm parameters are exposed and the user can overwrite the default values such as Attribute Clustering Method, Scan Radius, Distance method.

Generally speaking, you are good to choose the automated option letting the algorithm decide on the right parameters. Maintaining the parameters manual requires you to dive in into the maths of the algorithm. There are quite a lot of good articles and handsome explanations for the right parameters’ choice available in the internet, but if you are not a mathematical genius (I am definitely not) this may give you a hard time anyway. I listed the parameters nevertheless below.

SAP provides various parameters for the DBSCAN algorithm including some documentation:

Hard clustering

When this option is selected, the data is pre clustered based on the attributes field and the algorithm is called several times for each cluster. The result is the aggregation of the results of all calls. This is the default behaviour when Automatic option is selected. Since this option is required to call a distinct DBSCAN algorithm for each cluster defined in the attribute field, this mean it could be a lot slower to execute compare to the soft clustering.

Soft clustering

The DBSCAN algorithm determines automatically the clustering of the data based on the attributes provided in the Attributes field. When you select this option, the weighting of the clustering is determined by the field Category Weight. With soft clustering, all the data is sent to the DBSCAN algorithm in a single batch and the clustering is performed while the algorithm is running, this results in a lot faster execution than the hard clustering.

DBSCAN requires two parameters: scan radius (eps) and the minimum number of points (minPts ) required to form a cluster. The algorithm starts with an arbitrary starting point that has not been visited. This point’s eps neighbourhood is retrieved, and if the number of points it contains is equal to or greater than minPts , a cluster is started. Otherwise, the point is labelled as noise.

Minimum points (minPts)

If the data provided to the DBSCAN is 4,4,4,10. If the minimum number of points for a cluster is set to 5, the outliers will be 4,4,4,10. If the minimum number of points for a cluster is set to 2, the outlier will be 10 because there are 3 values with value 4 so this is considered as a cluster.

Scan radius (eps)

This is the distance between each point to consider they are part of the same cluster. For instance, if we have the value 3 and 5. If the Scan Radius is set to 1, both values 3 and 5 will be determined as outliers as the distance between the values is 2. If the Scan Radius is set to 3, the values 3 and 5 will be considered close enough to be part of the same cluster and no outliers will be detected.

More optional parameters are:

Thread number

Running multiple threads at the same time can improve performance but add load to the server during the execution

Category weight

This option is used when soft clustering is selected. This determines the importance of the clustering determined in the field attributes. The DBSCAN clustering is performed at all level and outliers is determined and aggregated at each level.

Distance method

Specifies the method to compute the distance between two points Options are: Manhattan, Euclidean, Minkowski , Chebyskv , Standardized Euclidean, and Cosine

Minkowski Power

Used with distance method Minkowski

If you go for the DBSCAN with automated parametrization, the effect will be as well a strong reduction and clustering of alerts: first, we test with Soft Clustering:

Now, we do the same with Hard Clustering, which seems to be the recommended way when choosing Automated DBSCAN according to the documentation:

Summary on ML for Alert Clustering

In general, applying both ML algorithms, K-Means and DBSCAN, for alerts in SAP IBP are useful to compress alerts on a large scale to a manageable number, which in turn means a good support for the supply chain planner to focus on the problem spots. The algorithms are easy to handle in the alert definitions; any significant deterioration of system performance has not been observed.

Nevertheless, the quality of the alert cluster contents need further investigation. Any comments or contributions on this subject are highly welcome!