4 min read
ML based predictive analytics: To alert or not to alert?
By: Heiko Mannherz on Dec 1, 2021 11:00:00 AM
As the only AIOps platform for automating SAP operations, Avantra performs the daily tasks that steal your precious time. In this article we highlight how the Avantra Enterprise Edition helps you to anticipate potential issues, not merely react to them. It’s the next step beyond monitoring.
To alert or not to alert, that’s the question
At Avantra, we aim for creating a platform to call your own, designed by Basis engineers for Basis engineers. With the new Edition we leverage AI monitoring functions to cut the noise, proactively address issues whilst pinpointing the location of serious alerts. You deserve to be freed up to engage in more rewarding tasks - and drive further value.
Since the very first version of Avantra (Syslink Xandria at that point in time), the monitoring focus was always on detecting critical trends rather than singular events. Many of the Avantra monitoring functions (we call them checks), evaluate conditions over a given period of time. Examples include the number of short dumps where a single one usually does no harm and doesn’t need immediate follow up.
However, for all checks based on performance data there was no easy way to achieve this. And the prevalence of digital transformation requires a dynamic, data-driven understanding of system performance and business impact. And here it is, right out of our innovation lab: Predictive analytics.
What is predictive analytics?
Now, what is this about? Avantra applies machine learning (ML) based predictive analytics to time series data. Typical examples are CPU usage, disk usage, tablespace usage, among others. Monitoring activities based on time series data are often parameterized using thresholds, often given in percentage of the maximum available resources. This makes them easy to predefine, but also easy to understand. And in Avantra, they are also easy to adjust at scale.
Nonetheless, there are situations where thresholds are too static, or become too strict. The intention of thresholds is to define a long term separator between good and not so good. Usually a situation only becomes critical if a threshold is exceeded over a longer period of time. Whenever resource usage is spikey, or there are usage bursts, thresholds can be exceeded for a short period of time, although this does not constitute a critical situation.
The tricky bit from a technical perspective is to decide; is this only a spike, or is this the beginning of a longer term problem? In the past, Avantra would alert you early on. Better safe than sorry. Because sometimes it requires only a brief look by a human to clarify the situation. The trouble is if these situations occur during the night, or over the weekend. If you are on call you may get hauled out of bed just to realize that the situation has recovered by the time you get there to verify. And believe us, we know exactly how this feels. We want you to sleep well at night.
The Avantra predictive analytics approach
The Avantra predictive analytics approach tries to remediate this exact problem. The Avantra agent applies ML based algorithms to understand usage patterns and to predict - at every given point in time - how the usage will evolve in the near future. These predictions are used to determine a trend whenever the agent evaluates a check. And this trend is taken into account, in addition to the defined thresholds. Roughly speaking, if a threshold is exceeded and we predict the situation clears again within a short period of time, the alert will be suppressed. Or deferred to the next evaluation cycle. Only if the situation seems to get worse is the alert actually raised.
How is predictive analytics used in SAP operations?
Predictive analytics helps Avantra to better understand usage spikes, or situations where the usage oscillates around a threshold (above, below, above, below, ...). The whole monitoring process becomes more smooth and adaptive. And it’s less likely that your sleep will be interrupted due to something that is not actually business critical.
Other solutions in the market, in particular the ones with a footprint in application performance management, claim to operate without thresholds, and to find critical conditions completely on their own. The technology is sometimes called anomaly detection. Avantra doesn’t follow this path for two reasons:
- Firstly, talking about anomalies transforms the problem of determining a critical system condition into a statistical problem. An anomaly is one or more values with a certain statistical significance. Humans tend to be bad at statistics which, in fact, can be pretty counterintuitive. That means, you barely have the opportunity to understand what the anomaly detection is actually doing, and therefore there is often very little to parametrize. Basically, these systems are black boxes. Avantra should be a platform to call your own. And that is only possible if you stay in control!
- Secondly, Avantra is a platform that’s usually deployed on premise. That’s just what most of our customers prefer. And it’s a shared application with agents running the business logic, i.e. performing checks and kicking off automations. These agents run on all the systems that host your most mission critical SAP systems. There is no way Avantra would add extra load to these agents in a way that’s required to run heavy machine learning algorithms (the likes required to detect anomalies).
As a best of breed SAP monitoring, AI and automation platform, we have spent plenty of our engineering capacities in finding efficient machine learning and optimization algorithms to do the job. We optimized the minimum amount of data required to get reasonable results, and implemented storage efficient transfer and handling of this data.
Sometimes it's a pity that so many of the exciting bits happen under the hood. However, if in the end it anticipates issues before they occur, reduces noise and pinpoints at risk factors, we have done the right thing to free you from the mundane, to engage in more rewarding work.
The Bounce, the Trigger and Everything In-between - SAP Automation
In the past 18 months, the need for SAP operations automation grew. These 5 requirements are the...
SAP Systems Monitoring in Focus - The Story of Scheer Group
Meet the team: Bernd Engist
Next up in our Meet the team blog series, we feature our Chief Technical Officer, Bernd Engist....