Generative AI Crash Course for Non-Tech Professionals. Register Now >

Innovative Approaches to Enhance Taxpayer Risk Prediction Using AI

Discover how language models revolutionize taxpayer risk prediction, combating GST frauds and ensuring robust financial governance.

In the realm of fiscal governance, the battle against tax fraud, especially in the context of Goods and Services Tax (GST), demands innovative solutions. Shubhradeep Nandi, a seasoned Data Scientist, shares insights into an intriguing approach that harnesses the power of large language models (LLMs) to enhance taxpayer risk prediction during the Machine Learning Developers Summit (MLDS) 2024 in Bengaluru.

Unveiling a Persistent Challenge

Tax fraud, particularly in the form of GST fraud, poses a substantial challenge for governments worldwide. With the potential loss of significant revenue, combating such fraudulent activities becomes imperative. Shubhradeep Nandi sheds light on the widespread occurrence of GST fraud across various regions in India, emphasizing the urgency of effective preventive measures.

The Need for Advanced Solutions

Traditional methods and linear mathematical formulas fall short of tackling the sophistication of tax fraudsters. Recognizing the limitations, Shubhradeep advocates for the integration of artificial intelligence, specifically large language models, to address the intricacies of GST fraud detection. The dynamic nature of tax fraud requires a nuanced and adaptive approach.

A Research Journey

Shubhradeep provides a glimpse into a research paper co-authored with Kalpita Roy, outlining a five-step methodology. The process involves a careful analysis of gaps, exploration of existing literature, and the introduction of an innovative method leveraging LLMs. The focus is on transforming textual data into structured profiles for effective taxpayer risk prediction.

Mathematics Behind the Model

The article delves into the mathematical intricacies of the proposed method, emphasizing the utilization of large language models for profile construction and tuning. Shubhradeep breaks down the steps, from creating a two-dimensional matrix to implementing a compact classifier, elucidating how the methodology bridges the gap between table data and fraud detection outcomes.

Experimental Validation

The research team’s experimental setup, utilizing Nvidia A40 and adopting machine learning models, showcases the effectiveness of their approach. Benchmarking against traditional methods and deep learning models, the large language models stand out, achieving promising F1 scores in fraud detection.


Shubhradeep concludes by highlighting the significance of AI, specifically LLMs, in fiscal governance. The article emphasizes the need for a natural language system to interact with textual data effectively. The live testing of the model, revealing the successful identification of fraud cases, underlines the practical applicability and potential impact of this innovative approach. In a landscape where tax frauds evolve in complexity, Shubhradeep Nandi’s research presents a pioneering approach to fortifying fiscal governance. The integration of large language models signifies a paradigm shift in addressing the challenges posed by GST frauds, ushering in a new era of effective taxpayer risk prediction.

Picture of Shreepradha Hegde

Shreepradha Hegde

Shreepradha is an accomplished Associate Lead Consultant at AIM, showcasing expertise in AI and data science, specifically Generative AI. With a wealth of experience, she has consistently demonstrated exceptional skills in leveraging advanced technologies to drive innovation and insightful solutions. Shreepradha's dedication and strategic mindset have made her a valuable asset in the ever-evolving landscape of artificial intelligence and data science.

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Elevate Your Team's AI Skills with our Proven Training Programs

Strengthen Critical AI Skills with Trusted Generative AI Training by Association of Data Scientists.

Our Accreditations

Get global recognition for AI skills

Chartered Data Scientist (CDS™)

The highest distinction in the data science profession. Not just earn a charter, but use it as a designation.

Certified Data Scientist - Associate Level

Global recognition of data science skills at the beginner level.

Certified Generative AI Engineer

An upskilling-linked certification initiative designed to recognize talent in generative AI and large language models

Join thousands of members and receive all benefits.

Become Our Member

We offer both Individual & Institutional Membership.