LLM Based Agentic Framework to Assist with IT Incidents

Author(s): Chandan Kumar Agarwal, Aditi Raghuvanshi, Suresh S K, Sovan Gosh

When an incident or alert arises for a service, it often requires a significant amount of time for the on-call personnel to identify the root cause of the issue and subsequently work toward its resolution. In certain instances, the issue may also be linked to other dependent services, making it more difficult for an on-call individual to pinpoint the problem. This identification process can be extremely time-consuming and presents considerable challenges for the on-call engineer. In this paper, we will discuss a solution designed to assist on-call personnel in reducing Mean Time to Recovery (MTTR) through an agentic framework, which functions as a conversational AIOps Incident Management bot. In this paper, we will discuss our solutions to challenges such as recommending resources from various sources, utilizing features related to previous similar incidents, and suggesting teammates to contact for assisting users by identifying the root cause and providing resolution steps with appropriate references. Furthermore, the framework facilitates additional automation tasks, including the generation of post- incident reports that encompass all key incident timelines and the creation of Jira issues based on the identified next actions. In this paper, we will provide a comprehensive overview of the solutions developed for these recommendations and automation tasks, as well as the online and offline monitoring and evaluation steps taken to track quality and user engagement.

Access this Lattice journal:

Picture of Association of Data Scientists

Association of Data Scientists

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Elevate Your Team's AI Skills with our Proven Training Programs

Strengthen Critical AI Skills with Trusted Generative AI Training by Association of Data Scientists.

Our Accreditations

Get global recognition for AI skills

Chartered Data Scientist (CDS™)

The highest distinction in the data science profession. Not just earn a charter, but use it as a designation.

Certified Data Scientist - Associate Level

Global recognition of data science skills at the beginner level.

Certified Generative AI Engineer

An upskilling-linked certification initiative designed to recognize talent in generative AI and large language models

Join thousands of members and receive all benefits.

Become Our Member

We offer both Individual & Institutional Membership.