Deep Dives

Large language model for multi-modal cheques extraction – Indrajit Kar

This talk sheds light on the innovative approach with Generative AI and LLMs and its potential to revolutionize the industry

Explore more from ADaSci

Hyper localization of leaks in piping and cabling systems using reinforcement learning

A case study on Credit Risk Analysis using Taiwanese Banking Data

A Deep Dive into NVIDIA Cosmos and Its Capabilities

IntelliQuery: Your very own AI-Driven Clinical Personal Assistant

Implementing DeepSeek-R1 Locally through Llama.cpp

Observing and Tracing Multi-Modal Multi-Agent Systems through Portkey

Quarkus: Java Development for Modern Applications.

Building Probabilistic and Isolated Learning models on Differentially private data for Campaign Optimisation in Programmatic setting

LangChain Vs LlamaIndex for Advanced Query Retrieval

Multi-agent Orchestration through OpenAI’s Swarm – A Hands-on Guide

Check extraction, a crucial process in automating information retrieval from documents has seen significant advancements with the introduction of Generative AI and Large language models. Indrajit Kar’s talk on “Large Language Model for Multi-Modal Check Extraction” sheds light on the innovative approach and its potential to revolutionize the industry. This article explores the key highlights and findings from the talk, showcasing the capabilities and impact of this cutting-edge technology.

Unleashing the Power of Large Language Models

The talk emphasizes the transformative potential of large language models in automating check extraction, an area that still relies heavily on manual efforts. Indrajit Kar highlights the breakthrough research showcased in the IEEE paper “Deep Check,” which leverages a distinct neural network for extraction and was honoured with the Best Paper Award.

Exploring the Model’s Capabilities

During the demonstration, the large language model showcased its prowess in extracting data from checks, receipts, and even handwritten information. Trained on a staggering 1.2 billion parameters over 28 days using AWS, the model employs a Hugging Face Transformer and distinguishes between hand-written and non-hand-written areas for extraction. The extraction process employs Connectionist Temporal Classification and Focal Loss to compare predicted probabilities with ground truth labels.

Collaboration and Challenges

To overcome challenges such as identifying the location of the MICR number, the authors collaborated closely with Raster I See You. They developed prompts to extract information from a hundred checks, demonstrating the importance of collaboration and domain-specific expertise in refining the extraction system.

Innovative Techniques and Reinforcement Learning

The article further delves into the technical aspects of the solution. A double-headed encoder and custom decoder are employed, while prompt augmentation utilizing chat GPT and a vector database for embeddings streamline the process. The integration of similarity metrics, transformers, and the concept of a chain of thoughts helps break down information into manageable pieces. Reinforcement learning is utilized to refine prompts, enhancing the model’s performance and eliminating non-relevant areas.

Applications and Future Prospects

The application of this advanced check extraction solution extends beyond its immediate domain. By understanding causality and the model’s decision-making process, it holds promise for other industries seeking to comprehend complex data points.

Conclusion

Indrajit Kar’s talk on the large language model for multi-modal check extraction reveals the significant strides made in automating a traditionally laborious process. With the power of Generative AI and large language models, the potential for increased efficiency and accuracy in check extraction is within reach. As researchers continue to refine and expand these technologies, we can anticipate broader implications across various industries, leading us towards a more automated future.

Vaibhav Kumar

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Elevate Your Team's AI Skills with our Proven Training Programs

Strengthen Critical AI Skills with Trusted Generative AI Training by Association of Data Scientists.

Our Latest Courses

Large language model for multi-modal cheques extraction – Indrajit Kar

Explore more from ADaSci

Unleashing the Power of Large Language Models

Exploring the Model’s Capabilities

Collaboration and Challenges

Innovative Techniques and Reinforcement Learning

Applications and Future Prospects

Conclusion

Vaibhav Kumar

The Chartered Data Scientist Designation

Elevate Your Team's AI Skills with our Proven Training Programs

Our AI Courses

Agentic AI in Production: Hands-On Workshop

Agentic AI Workforce Readiness Strategies for CXOs

MCP and A2A – The AI Protocols for Next-Gen Agent Ecosystems

Our Accreditations

Get global recognition for AI skills

Chartered Data Scientist (CDS™)

The highest distinction in the data science profession. Not just earn a charter, but use it as a designation.

Certified Data Scientist - Associate Level

Global recognition of data science skills at the beginner level.

Certified Generative AI Engineer

An upskilling-linked certification initiative designed to recognize talent in generative AI and large language models

Join thousands of members and receive all benefits.

Become Our Member

We offer both Individual & Institutional Membership.

The power of intelligence to propel humanity and make a difference

Our Accrediations

CDS Program

Membership

About

For Organizations

Journal