ADaSci Premium Membership fee will be revised from 1st March 2024. Lock your membership for 1 year at current price.

Large language model for multi-modal cheques extraction – Indrajit Kar

This talk sheds light on the innovative approach with Generative AI and LLMs and its potential to revolutionize the industry

Check extraction, a crucial process in automating information retrieval from documents has seen significant advancements with the introduction of Generative AI and Large language models. Indrajit Kar’s talk on “Large Language Model for Multi-Modal Check Extraction” sheds light on the innovative approach and its potential to revolutionize the industry. This article explores the key highlights and findings from the talk, showcasing the capabilities and impact of this cutting-edge technology.

Unleashing the Power of Large Language Models

The talk emphasizes the transformative potential of large language models in automating check extraction, an area that still relies heavily on manual efforts. Indrajit Kar highlights the breakthrough research showcased in the IEEE paper “Deep Check,” which leverages a distinct neural network for extraction and was honoured with the Best Paper Award.

Exploring the Model’s Capabilities

During the demonstration, the large language model showcased its prowess in extracting data from checks, receipts, and even handwritten information. Trained on a staggering 1.2 billion parameters over 28 days using AWS, the model employs a Hugging Face Transformer and distinguishes between hand-written and non-hand-written areas for extraction. The extraction process employs Connectionist Temporal Classification and Focal Loss to compare predicted probabilities with ground truth labels.

Collaboration and Challenges

To overcome challenges such as identifying the location of the MICR number, the authors collaborated closely with Raster I See You. They developed prompts to extract information from a hundred checks, demonstrating the importance of collaboration and domain-specific expertise in refining the extraction system.

Innovative Techniques and Reinforcement Learning

The article further delves into the technical aspects of the solution. A double-headed encoder and custom decoder are employed, while prompt augmentation utilizing chat GPT and a vector database for embeddings streamline the process. The integration of similarity metrics, transformers, and the concept of a chain of thoughts helps break down information into manageable pieces. Reinforcement learning is utilized to refine prompts, enhancing the model’s performance and eliminating non-relevant areas.

Applications and Future Prospects

The application of this advanced check extraction solution extends beyond its immediate domain. By understanding causality and the model’s decision-making process, it holds promise for other industries seeking to comprehend complex data points.

Conclusion

Indrajit Kar’s talk on the large language model for multi-modal check extraction reveals the significant strides made in automating a traditionally laborious process. With the power of Generative AI and large language models, the potential for increased efficiency and accuracy in check extraction is within reach. As researchers continue to refine and expand these technologies, we can anticipate broader implications across various industries, leading us towards a more automated future.

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Elevate Your Team's AI Skills with our Proven Training Programs

Strengthen Critical AI Skills with Trusted Generative AI Training by Association of Data Scientists

Explore more from Association of Data Scientists