Building a scalable real-time ML inference platform for AIOps

Author(s): Praveen Manoharan, Nilesh Nayan, Aaditya Sharma, Aravindakumar Venugopalan

Abstract

In this paper, we present the method of building a scalable, real-time inference platform for large-scale time-series anomaly detection and root-cause analysis solutions, built as a part of AI For Operations (AIOps) tool. AIOps is a tool built to ease the manual and time-consuming activities of DevOps engineers involved in monitoring and troubleshooting production systems. Such a system has to be operated in real-time to detect anomalies in a plethora of time-series metrics and logs from the productions systems in order to provide timely alerts and possible root causes for quick remediation and thus requires a low-latency operation. This system must be scalable for the vast amounts of data involved for ETL and ML inference jobs that the solution needs. In this work, we show how we engineered and scaled up the AI research POC to a solution that supports a massive search engine system, where we achieved reduction in latency by 30x. We also evaluate different tools for inference such as Apache Airflow, Serverless REST API and Spark engine and demonstrate our improvements achieved and our estimations of these different commonly used platforms for ML inference, in terms of feasibility and cost for an AIOps solution.

Picture of Association of Data Scientists

Association of Data Scientists

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Elevate Your Team's AI Skills with our Proven Training Programs

Strengthen Critical AI Skills with Trusted Generative AI Training by Association of Data Scientists.

Our Accreditations

Get global recognition for AI skills

Chartered Data Scientist (CDS™)

The highest distinction in the data science profession. Not just earn a charter, but use it as a designation.

Certified Data Scientist - Associate Level

Global recognition of data science skills at the beginner level.

Certified Generative AI Engineer

An upskilling-linked certification initiative designed to recognize talent in generative AI and large language models

Join thousands of members and receive all benefits.

Become Our Member

We offer both Individual & Institutional Membership.

Subscribe to our Newsletter