Deep Dives

A Deep Dive into Chain of Draft Prompting

Chain of Draft (CoD) optimizes LLM efficiency by reducing verbosity while maintaining accuracy. It cuts token usage, lowers costs, and speeds up inference for real-world AI applications.

Explore more from ADaSci

Mastering Multimodal Understanding and Generation with Janus-Pro

Strategies for Scaling LLM Deployment

Beyond Traditional Vibration classification

Top 5 AI Papers Published This Week

Building Knowledge Graphs to Enhance Data-Driven Decision-Making

RAVEN for Enhancing Vision-Language Models with Multitask Retrieval-Augmented Learning

Deep Learning DevCon (DLDC) 2023 Wrap: Unveiling the Top Highlights and Insights

Fine-Tuning LLMs with Reinforcement Learning

Optimizing LLM Inference for Faster Results Using Quantization – A Hands on Guide

The Strategic Advantage of Chartered Data Scientist™ Accreditation for Data Science Service Providers

The use of reasoning strategies like Chain of Thought (CoT) prompting by Large Language Models (LLMs) has transformed problem-solving. Nevertheless, CoT considerably raises latency, inference cost, and token usage even as it boosts accuracy. This has prompted researchers to develop a unique prompting technique called Chain of Draft (CoD), which reduces verbosity without sacrificing the accuracy of reasoning. LLMs become more efficient and economical as CoD reduces token use while achieving performance that is on par with or better than CoT. The concepts, design, and real-world uses of CoD are explored in this article.

Table of Content

Chain of Draft Introduction
How CoD Works?
Illustrative Example
CoD vs. CoT: A Comparative Analysis
Key Features and Benefits
Practical Use Cases
Technical Deep Dive

Let’s first start by understanding what “Chain of Draft” is.

Chain of Draft Introduction

Chain of Draft (CoD) is a new prompting strategy for LLMs that encourages the generation of minimalistic, yet informative, intermediate reasoning outputs. By prioritizing efficiency and reducing verbosity, CoD enables LLMs to maintain or surpass the accuracy of CoT while significantly decreasing token usage, computational cost, and latency. This approach makes LLMs more practical for real-world applications where efficiency is crucial.

How CoD Works?

CoD draws inspiration from human problem-solving techniques. When presented with multi-step reasoning, people frequently write down important details or rough drafts to help them think through the process. CoD mimics this by encouraging LLMs to provide succinct, dense-information outputs at every stage, emphasizing important ideas and cutting out superfluous detail. In contrast to this approach, Chain-of-Thought (CoT) frequently results in verbose reasoning steps and higher token use.

Illustrative Example: Solving a Simple Math Problem

Imagine a scenario in which Jason offers part of his twenty lollipops to Denny, leaving him with just twelve. Finding out how many he handed away is the task.

With a typical answering strategy, an LLM could just produce 8. This doesn’t make the reasoning process transparent. A Chain-of-Thought method would divide it into several stages: Jason begins by using twenty lollipops. He leaves twelve after giving some to Denny. The provided figure is computed as 20 – 12. Response: 8.

This response is verbose and ineffective, despite being clear. Instead, the argument would be expressed as follows using a Chain-of-Draft approach: 20 – x = 12; x = 20 – 12 = 8. This succinct approach shows the logical process while removing unnecessary information.

CoD vs. CoT: A Comparative Analysis

In order to assess CoD, researchers compared it to CoT and conventional methods for responding to various reasoning tasks. CoD achieves similar accuracy to CoT while dramatically lowering token usage and inference delay, according to experiments on arithmetic reasoning, commonsense reasoning, and symbolic reasoning.

For instance, GPT-4o with normal prompting scored 53.3% accuracy in arithmetic reasoning tasks using the GSM8K benchmark, but CoT raised accuracy to 95.4% at the expense of generating an average of 205 tokens per response. On the other hand, CoD reduced the number of tokens by over 80% and the inference time by 76.2% while achieving 91.1% accuracy with just 43.9 tokens per response. CoD continuously maintained good accuracy while drastically reducing computational overhead in the commonsense and symbolic reasoning benchmarks, showing similar patterns.

CoD’s main benefit is its capacity to strike a balance between accuracy and efficiency. Although CoT’s verbosity is helpful for sophisticated thinking, it frequently leads to needless processing costs. CoD reduces this by condensing reasoning into the bare minimum of information required, which makes it perfect for AI applications that are cost-sensitive.

Key Features and Benefits

Token Efficiency

Uses as little as 7.6% of CoT tokens while maintaining accuracy.
Reduces API costs for cloud-based inference models.

Lower Latency

Speeds up responses by limiting intermediate steps.
Achieves up to 76% latency reduction compared to CoT.

Key Features of CoD

Improved Interpretability

Generates concise drafts without losing reasoning traceability.
Eliminates unnecessary details while preserving correctness.

Scalability for Real-World Applications

Ideal for low-latency environments like chatbots and automated assistants.
Reduces cloud inference costs for production-scale LLM applications.

Practical Use Cases

For tasks involving mathematical and logical reasoning, CoD is especially well-suited due to its effectiveness and interpretability. When used to solve problems in fields such as physics, engineering, and algorithm development, CoD minimizes computational overhead while maintaining the accuracy of reasoning steps. Because of this, it is useful for AI-powered learning resources and automated tutoring programs that require clear, sequential explanations.

Additionally, CoD is very suitable for enterprise AI adoption, where scalability and cost effectiveness are critical factors. CoD can be used by large-scale AI systems that work in legal reasoning, financial analysis, and customer service to lower API costs without sacrificing response quality. Organizations can deploy additional AI instances without going over budget by reducing the computational load.

Technical Deep Dive

In experiments, CoD was evaluated on arithmetic, commonsense, and symbolic reasoning tasks, using models like GPT-4o and Claude 3.5 Sonnet.

Arithmetic Reasoning: On the GSM8k benchmark, CoD achieved accuracy comparable to CoT, but with significantly reduced token usage and latency. For instance, with GPT-4o, CoD achieved 91.1% accuracy with 43.9 tokens, compared to CoT’s 95.4% accuracy with 205.1 tokens.

GSM8K evaluation results on small language models

Commonsense Reasoning: In date and sports understanding tasks, CoD again demonstrated reduced latency and cost, with fewer tokens and, in some cases, higher accuracy than CoT.

Symbolic Reasoning: On coin flip tasks, CoD matched CoT’s perfect accuracy but with a significant reduction in tokens.

However, CoD’s effectiveness was less consistent in zero-shot settings and with smaller language models.

Final Words

By drastically lowering token usage and inference delay while preserving excellent accuracy, Chain of Draft (CoD) is a huge improvement in LLM efficiency. For practical AI applications, it is a useful method since it simplifies thinking without compromising interpretability. Techniques like CoD will be essential to ensure AI systems are responsive, efficient, and affordable as AI deployment grows.

References

Chain of Draft Research Paper

Aniruddha Shrikhande

Aniruddha Shrikhande is an AI enthusiast and technical writer with a strong focus on Large Language Models (LLMs) and generative AI. Committed to demystifying complex AI concepts, he specializes in creating clear, accessible content that bridges the gap between technical innovation and practical application. Aniruddha's work explores cutting-edge AI solutions across various industries. Through his writing, Aniruddha aims to inspire and educate, contributing to the dynamic and rapidly expanding field of artificial intelligence.

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Elevate Your Team's AI Skills with our Proven Training Programs

Strengthen Critical AI Skills with Trusted Generative AI Training by Association of Data Scientists.

Our Latest Courses

A Deep Dive into Chain of Draft Prompting

Explore more from ADaSci

Table of Content

Chain of Draft Introduction

How CoD Works?

Illustrative Example: Solving a Simple Math Problem

CoD vs. CoT: A Comparative Analysis

Key Features and Benefits

Token Efficiency

Lower Latency

Improved Interpretability

Scalability for Real-World Applications

Practical Use Cases

Technical Deep Dive

Final Words

References

Aniruddha Shrikhande

The Chartered Data Scientist Designation

Elevate Your Team's AI Skills with our Proven Training Programs

Our AI Courses

Agentic AI Workforce Readiness Strategies for CXOs

MCP and A2A – The AI Protocols for Next-Gen Agent Ecosystems

AI-Driven Risk Management in Derivatives Trading – Webinar Recording

Our Accreditations

Get global recognition for AI skills

Chartered Data Scientist (CDS™)

The highest distinction in the data science profession. Not just earn a charter, but use it as a designation.

Certified Data Scientist - Associate Level

Global recognition of data science skills at the beginner level.

Certified Generative AI Engineer

An upskilling-linked certification initiative designed to recognize talent in generative AI and large language models

Join thousands of members and receive all benefits.

Become Our Member

We offer both Individual & Institutional Membership.

The power of intelligence to propel humanity and make a difference

Our Accrediations

CDS Program

Membership

About

For Organizations

Journal