AI-Powered Image-to-JSON Conversion for LLM Fine-Tuning Using ‘Outlines’

Generate structured JSON datasets from images using Outlines, a Python library crafted for robust and accurate Large Language Model (LLM) interactions.

Extracting structured data from images is a crucial step for fine-tuning large language models (LLMs) to handle domain-specific tasks. This hands-on guide demonstrates how to transform receipt images into JSON datasets using Outlines, a Python library designed for simplifying workflows with LLMs. By leveraging vision-language models (VLMs) like Qwen-2-VL or Pixtral, you’ll learn to process images, extract key details such as store names, item lists, and totals, and prepare structured outputs ready for fine-tuning. Whether you’re a researcher or developer, this guide equips you with practical tools to streamline dataset creation.

Table of Content

  1. Introduction to Outlines
  2. Understanding Outlines Architecture
  3. Practical Implementation

Introduction to Outlines

Outlines is a powerful Python library designed to simplify and enhance text generation workflows with Large Language Models (LLMs). Built by .txt, Outlines excels in structured generation, ensuring outputs like valid JSON or text adhering to complex patterns like regex.

It supports both OpenAI and cutting-edge open-source models through integrations with Transformers, llama.cpp, and others, making it versatile for production use. 

With features like robust prompt templating, JSON schema compliance, and seamless ecosystem compatibility, Outlines empowers developers to create reliable, efficient LLM applications with minimal overhead during inference.

Understanding Outlines’s Architecture

Outlines uses a structured generation framework to guide language models in producing text that conforms to predefined rules, such as JSON schemas or regex patterns. Unlike traditional LLM workflows that consider all potential tokens at every step, Outlines restricts generation to legal tokens only. This is achieved through integration with finite-state machines or grammar-based automata, ensuring that the output aligns with strict structural requirements.

In structured generation, rules like regex patterns are transformed into automata. For instance, a regex for decimal numbers (^\d*(\.\d+)?$) is converted into a finite-state machine that defines all permissible token transitions. If the generated sequence so far is “748,” the automata highlights valid next steps: additional digits, a decimal point, or sequence termination. This mapping ensures that only valid transitions occur.

During each generation step, the model processes the current sequence to produce token logits, representing the probabilities of potential next tokens. Outlines’ architecture modifies these logits, setting probabilities for illegal tokens to zero. This filtering narrows the token space to only valid options, from which the next token is sampled. For example, continuing “748” under the decimal number pattern may yield “748.92,” adhering precisely to the automata-defined rules.

The structured generation process provides robust outputs for use cases requiring strict formatting, such as JSON APIs or structured document creation. By combining automata-based constraints with dynamic logits processing, Outlines ensures both reliability and precision in text generation.

Practical Implementation

Step 1 : Install Required Libraries

Install the dependencies including Outlines, transformers, and additional Python libraries:

We will be using Outlines 0.1.3, as the latest version is unstable while using outlines.generate.json function.

Step 2 : Import Necessary Libraries

Import essential modules for handling language models, image processing, and structured data representation:

Step 3 : Initialize the Model

Define the model class and initialize the vision-language transformer model:
We are currently using the Qwen 2B model, which provides decent results. However, for more accurate and refined outputs, you can consider using more powerful models such as Qwen2.5-72B, which has 72 billion parameters. Alternatively, other models that align with your system specifications could also be utilized for better performance

Step 4 : Image Preprocessing

Load and resize an image to ensure it fits the model’s input size:

Input:

Outlines Input

Step 5 : Define Schema with Pydantic

Create data classes for receipt information extraction:

Step 6 : Generate the Prompt

Prepare a detailed prompt to feed into the model:

Step 7 : Set Up the Generator

Create a structured JSON generator using the Outlines library:

Step 8 : Generate the Output

Process the prompt and the image to extract receipt data:

These steps enable you to load a receipt image, process it using structured generation with Outlines, and extract detailed information in a JSON format. ​​

Output:

Step 9 : Save the Output to a JSON File

Save the extracted receipt data to a .json file for further use:

This step ensures that the structured data generated by the model is stored in a JSON file, making it easy to share, process, or integrate with other applications.

Final Words

In conclusion, this guide demonstrates how to efficiently create a structured JSON dataset from images using Outlines. By leveraging the power of structured generation and ensuring that only valid tokens are considered, you can maintain data integrity while preparing it for LLM fine-tuning. The final output is saved in a JSON file, ensuring that your dataset is ready for seamless integration with machine learning models, facilitating efficient model training and deployment. This method streamlines the process, ensuring both accuracy and scalability.

References

  1. Outline’s Github Repository
  2. Outline’s Official Documentation
Picture of Aniruddha Shrikhande

Aniruddha Shrikhande

Aniruddha Shrikhande is an AI enthusiast and technical writer with a strong focus on Large Language Models (LLMs) and generative AI. Committed to demystifying complex AI concepts, he specializes in creating clear, accessible content that bridges the gap between technical innovation and practical application. Aniruddha's work explores cutting-edge AI solutions across various industries. Through his writing, Aniruddha aims to inspire and educate, contributing to the dynamic and rapidly expanding field of artificial intelligence.

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Elevate Your Team's AI Skills with our Proven Training Programs

Strengthen Critical AI Skills with Trusted Generative AI Training by Association of Data Scientists.

Our Accreditations

Get global recognition for AI skills

Chartered Data Scientist (CDS™)

The highest distinction in the data science profession. Not just earn a charter, but use it as a designation.

Certified Data Scientist - Associate Level

Global recognition of data science skills at the beginner level.

Certified Generative AI Engineer

An upskilling-linked certification initiative designed to recognize talent in generative AI and large language models

Join thousands of members and receive all benefits.

Become Our Member

We offer both Individual & Institutional Membership.