Vision-Powered RAG Agents for Organizational Software and Web Operations

Author(s): Varun Malhotra, Gaurav Adke, Ameya Divekar

There is growing interest in developing LMM (Large Multi Modal) -based agents among researchers in academia and developers in industry. We demonstrate an autonomous LMM- based agent that utilizes enterprise knowledge to learn and perform tasks in software independently. We have challenged the agent to access internal and external websites, which shows its capabilities for this paper. This eliminates many repetitive web tasks done by staff. The system is innovative because it automates decision-making using enterprise knowledge bases. It uses retrieval augmented generation in creating a web agent, from which a series of execution steps were constructed using LMM knowledge and organizational databases. We create a pipeline that generates action commands with smart HTML parsing that extracts the XPath and creates a unique bounding box ID to avoid overlap of IDs. It also has a feature of tool calling ability to evaluate itself in executing a task. An agentic system makes decisions based on high-level guidance that is similar to a human’s. The precision with web automation is about 40%, achieved through most open websites, but better on internal pages via RAG in documentation. This innovation brought up our accuracy by 24%. Another criterion for the agent consists of steps taken, the total token cost, and self-reflection on the results.

Access this Lattice Journal:

Picture of 晓军

晓军

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Elevate Your Team's AI Skills with our Proven Training Programs

Strengthen Critical AI Skills with Trusted Generative AI Training by Association of Data Scientists.

Our Accreditations

Get global recognition for AI skills

Chartered Data Scientist (CDS™)

The highest distinction in the data science profession. Not just earn a charter, but use it as a designation.

Certified Data Scientist - Associate Level

Global recognition of data science skills at the beginner level.

Certified Generative AI Engineer

An upskilling-linked certification initiative designed to recognize talent in generative AI and large language models

Join thousands of members and receive all benefits.

Become Our Member

We offer both Individual & Institutional Membership.