From English to SQL: Bridging the Gap in Data Manipulation

Unlocking data's potential: Democratize analytics with generative AI, revolutionizing insights from English to SQL.
Data Manipulation

In the bustling realm of analytics, Nikhil Ahuja, Senior Assistant Vice President at EXL Analytics, took center stage at the Machine Learning Developers Summit (MLDS) 2024 held in Bengaluru. With over 12 years of experience, Ahuja spearheads a dynamic team of data scientists, steering the development of innovative digital solutions at EXL. His expertise lies in utilizing Generative AI, Deep Learning, and ML techniques to enhance customer experiences, focusing on risk, marketing, and product strategy for global financial institutions.

Championing Generative AI: Beyond Natural Language

Ahuja began by addressing the prevailing dependency on coding for data analysis and insights generation, emphasizing the need to democratize the process. The crux of the challenge lay in enabling individuals with English proficiency to manipulate data effectively. The focus shifted towards Generative AI, intending to generate technical content, specifically SQL queries, rather than conventional natural language.

Navigating Complexities: Unraveling the Landscape

The magnitude of the problem became apparent as Ahuja delved into the details. Working with over 500 variables across 10+ datasets, the complexity soared. Datasets presented varied levels of aggregation, different naming conventions, and multiple sources for similar solutions. The challenges were further compounded by the diverse nature of banking datasets, encompassing customer responses, acquisitions, demographics, and performance metrics.

Overcoming Hurdles: Building a Custom Framework

To address challenges, a robust custom rack framework was introduced. Open-source language models posed a sensitivity challenge, struggling to discern business context. Identifying the right context for variables and datasets, along with incorporating custom calculations, became pivotal. A Knowledge Graph was introduced, offering a deeper understanding of variables and datasets, coupled with a custom scoring algorithm to enhance accuracy.

The Flow: From Query to Insight

Ahuja outlined the intricacies of the framework’s flow, starting with query mining, variable identification, and synonym detection. The focus was on generating a “focused Knowledge Graph” tailored to specific user queries. The subsequent steps involved enriching prompts, generating SQL queries, and addressing errors. The end-to-end process, driven by open-source language models, promised SQL query generation in an impressive 15-16 seconds.

Conversational AI: Empowering Users

The framework culminated in a Conversational AI tool, allowing users to interact seamlessly with the system. Pre-built Knowledge Graphs and vector stores facilitated swift responses. The tool showcased an ability to automate reporting, generate charts, and offer a fluid conversation experience, allowing users to pose dynamic, free-flowing questions.

Transformative Impact: Democratized Insights and Beyond

Early results indicated a substantial reduction in insight generation time, with ad hoc analysis witnessing a remarkable 60-70% time reduction. The tool’s deployment to users highlighted a democratized approach to data and insight generation. Ahuja acknowledged areas of improvement, including fine-tuning language models and enhancing fluidity in user queries.

The Road Ahead

Despite the successes, the team acknowledged the ongoing journey of improvement. Fine-tuning language models, incorporating more user suggestions, and building an intuitive UI for query suggestions were identified as key focus areas. The continuous loop of feedback and refinement showcased a commitment to evolving the tool for even greater usability and impact.

Nikhil Ahuja’s talk at MLDS 2024 marked a significant stride in the realm of data analysis, unraveling the potential of Generative AI to democratize insights, making data manipulation and analysis accessible to a broader audience. As the industry marches forward, the marriage of language models and custom frameworks promises a future where data-driven decision-making is within everyone’s reach.

Picture of Shreepradha Hegde

Shreepradha Hegde

Shreepradha is an accomplished Associate Lead Consultant at AIM, showcasing expertise in AI and data science, specifically Generative AI. With a wealth of experience, she has consistently demonstrated exceptional skills in leveraging advanced technologies to drive innovation and insightful solutions. Shreepradha's dedication and strategic mindset have made her a valuable asset in the ever-evolving landscape of artificial intelligence and data science.

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Elevate Your Team's AI Skills with our Proven Training Programs

Strengthen Critical AI Skills with Trusted Generative AI Training by Association of Data Scientists.

Our Accreditations

Get global recognition for AI skills

Chartered Data Scientist (CDS™)

The highest distinction in the data science profession. Not just earn a charter, but use it as a designation.

Certified Data Scientist - Associate Level

Global recognition of data science skills at the beginner level.

Certified Generative AI Engineer

An upskilling-linked certification initiative designed to recognize talent in generative AI and large language models

Join thousands of members and receive all benefits.

Become Our Member

We offer both Individual & Institutional Membership.