Deep Dives

Intelligent Document Processing with No-Code LLM Platform Unstract

Unstract automates document processing with AI, reducing manual effort, errors, and costs.

Explore more from ADaSci

RAG with Milvus Vector Database and LangChain

How to optimize the infrastructure costs of LLMs

B2B Sales Leads Generation Using Commercial Payments Data: A Novel Application of Recommender Systems

Machine Learning Implementations Scrutinized With A Process Re-engineering Lens

Implementing DeepSeek-R1 Locally through Llama.cpp

How L&D Leaders Can Drive AI Readiness Across the Enterprise?

A Hands-On Guide to RecurrentGemma With Hugging Face

S&P 500 Stocks Movement Prediction using Deep Learning

A Deep Dive into ElasticSearch and Kibana’s Semantic Capabilities

Predicting Custom Ad Performance Metric using Contextual Features

Traditional methods of manually processing and understanding complex documents such as contracts, invoices, legal briefs, medical records, etc. are both time-consuming and expensive and prone to human error. Unstract, a no-code AI-powered platform, is designed to mitigate these challenges occurring in the domain of document processing. By utilizing LLMs and advanced OCR techniques, it offers a streamlined solution for automating the complete document lifecycle, from ingestion and extraction to transformation and export. This article explores Unstract and showcases a practical implementation of it.

Understanding Unstract
Overview of LLMWhisperer
Hands-on Implementation of Unstract

Understanding Unstract

Unstract is a no-code platform that assists in automating and solving complex business processes involving complicated documents with a human in the loop. It is primarily a culmination of intelligent document processing and robotic process automation systems that have increased capability due to the usage of large language models. It is available in three different editions for users to experiment and deploy their LLM workflows which can process and parse documents with ease and without the need of programming.

The three editions available are the Unstract cloud, open-source, and on-premise. The cloud edition is a fully managed, hosted version which is the easiest way for users to get started and experiment with the platform. It offers a 14-day trial where users can experience enterprise-only features such as LLMChallenge, SinglePass Extraction, Summarized Extraction, Human Quality Review, and SSO Support. These features are also available in the open-source and on-premise editions.

The LLMChallenge uses two LLMs to give output and assist users in comparing the LLMs. SinglePass Extraction is the technique used to optimize the process of information extraction from documents using LLMs. It combines all the user prompts into a single, large prompt instead of sending multiple individual prompts to the LLM for each piece of information that needs to be extracted, thereby optimizing and reducing the token usage and saving costs. Summarized extraction, on the other hand, refers to the process of using LLMs to extract key information from documents and present it in a concise and organized way. It goes beyond simple extraction as it involves understanding the context and relationships within the document to provide a meaningful summary of the extracted information.

Human Quality Review is another feature, which means a side-by-side comparison of extracted values and source documents with source segment highlights for human review. SSO support, another feature, refers to a system that allows users to access multiple applications with a single set of login credentials, increasing the degree of convenience and security while managing user access.

Unstract open-source edition allows users to test and try its features without any subscription or account creation. The users can clone the official GitHub repository (https://github.com/Zipstack/unstract) and run it locally with all the features and services. The on-premise edition can be installed on any infrastructure that supports Docker or Kubernetes and it supports the three major cloud service providers – AWS, Azure & GCP.

Unstract Workflow

Unstract can help and support in automating complex business processes involving long and complex documents with human review based on the following flow –

It features a Prompt Studio, a no-code environment designed for handling complex documents. Prompt Studio enables users to engineer their prompts supporting custom document types, a combination of multiple LLMs, vector DBs, embedders, and extraction tools such as LlamaParse. It also provides prompt monitoring, success evaluation across multiple document samples, and building structured information from unstructured data files.

Workflows in Unstract can utilize different data sources for efficient unstructured data retrieval such as AWS S3, Dropbox, GDrive, Google Cloud Storage, SFTP/SSH, Azure Cloud Storage, etc. Additionally, users can also connect and transfer their processed, structured data to platforms such as MariaDB, Snowflake, Redshift, MSSQL/MySQL, OracleDB, PostgreSQL, and BigQuery.

The workflows can also be deployed as APIs, allowing users to POST unstructured documents to the API and receive structured JSON data in response. The workflow deployment can be done as unstructured data APIs, unstructured data ETL pipelines, or custom Q&A apps.

Overview of LLMWhisperer

LLMWhisperer is another technology by Unstract that can present data from documents of different designs and formats to LLMs in a way that they can be understood. LLMWhisperer is available as an API that can be integrated with existing systems for preprocessing documents before they are passed as input to the LLM. Unstract offers LLMWhisperer in the text extractor category under their dashboard along with other parsers.

The recommended use cases of LLMWhisperer can be understood by the image provided below –

Hands-on Implementation of Unstract

The following implementation steps are for the cloud account (free trial) and for an open-source account (programmatic way).

Cloud Account (Free trial for 14 days) –

Step 1: Create a free account and log in to visit the dashboard, make sure you select the Unstract option and then get started with its cloud edition –

Step 2: Setting the LLM, Vector DB, Embedding, and Text Extractor –

By default Azure GPT-4o LLM is offered, this can be changed to other LLMs by selecting New LLM Profile –

If we select OpenAI, we can set the LLM required based on the API keys and other parameters and test the connection –

Once the test connection returns success, click on the submit button to add it to your LLM list –

By default, Unstruct cloud account provides Postgres Free Trial VectorDB. We can add our own Vector DB using the same method we used for adding an LLM –

Let’s use Pinecone and provide the necessary details such as API key, Region, and Cloud Service Provider for the database –

Click on the submit button once the test connection is successful –

Using the methods for LLM and Vector DB setup, we can also set up embedding and text extractors (such as LlamaParse), for this tutorial, we will go with the default setup.

Step 3: Click on Prompt Studio on the left panel and select a new project –

Now, we will set the LLM, vector DB, embedding and text extractor profile using the settings option present on the top right –

Step 4: Once the LLM profile is set, we will upload our documents for parsing using the manage documents option –

Step 5: Add a new prompt – “What is the name of the issuer or the bank who has issued this credit card?”

Select run all LLMs for all documents to see the output –

Step 6: Checking the output for another prompt – “What is the name of the client and also mention closing date and account ending. “

The output is correct as per the pdf document –

Open-source Programmatic Way –

Prerequisites for programmatic way to run the open-source version:

Make sure the following prerequisites are installed on the system –

Docker
Docker Compose
GIT

Step 1: Clone the official GIT repo using the command: git clone https://github.com/Zipstack/unstract.git –

Step 2: Change the directory and execute the command – ./run-platform.sh (make sure Docker is up and running)

Step 3: Visit http://frontend.unstract.localhost and enter the username, and password unstract to log in.

Step 4: Log in using the username & password as unstract to open the dashboard, once the login is done, the tool can be used just like the cloud service as discussed above forthe cloud account.

Final Words

Unstract represents a no-code approach in document processing that can empower businesses of all sizes to automate their complex workflows without requiring specialized technical expertise. By using LLMs, vector DBs, and embeddings at the backend, Unstract goes beyond a simple data extraction. It understands the context and relationships within documents to deliver accurate and meaningful results. This utility is very important for business in terms of handling and working with unstructured data.

References

Sachin Tripathi

Sachin Tripathi is the Manager of AI Research at AIM, with over a decade of experience in AI and Machine Learning. An expert in generative AI and large language models (LLMs), Sachin excels in education, delivering effective training programs. His expertise also includes programming, big data analytics, and cybersecurity. Known for simplifying complex concepts, Sachin is a leading figure in AI education and professional development.

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Elevate Your Team's AI Skills with our Proven Training Programs

Strengthen Critical AI Skills with Trusted Generative AI Training by Association of Data Scientists.

Our Latest Courses

Intelligent Document Processing with No-Code LLM Platform Unstract

Explore more from ADaSci

Table of Contents

Understanding Unstract

Overview of LLMWhisperer

Hands-on Implementation of Unstract

Cloud Account (Free trial for 14 days) –

Open-source Programmatic Way –

Final Words

References

Sachin Tripathi

The Chartered Data Scientist Designation

Elevate Your Team's AI Skills with our Proven Training Programs

Our AI Courses

Agentic AI in Production: Hands-On Workshop

Agentic AI Workforce Readiness Strategies for CXOs

MCP and A2A – The AI Protocols for Next-Gen Agent Ecosystems

Our Accreditations

Get global recognition for AI skills

Chartered Data Scientist (CDS™)

The highest distinction in the data science profession. Not just earn a charter, but use it as a designation.

Certified Data Scientist - Associate Level

Global recognition of data science skills at the beginner level.

Certified Generative AI Engineer

An upskilling-linked certification initiative designed to recognize talent in generative AI and large language models

Join thousands of members and receive all benefits.

Become Our Member

We offer both Individual & Institutional Membership.

The power of intelligence to propel humanity and make a difference

Our Accrediations

CDS Program

Membership

About

For Organizations

Journal