ADaSci Banner 2024

Thought-Augmented Reasoning through Buffer of Thoughts (BoT) 

Enhance the robustness and accuracy of LLM through thought-augmented reasoning based on the Buffer of Thought approach.

Buffer of Thoughts (BoT) approach is a recent development in the domain of large language model research. Implementation of this thought-augmented reasoning approach enhances the accuracy, efficiency, and robustness of LLMs by introducing the novel concepts of meta-buffer, thought-template, and buffer-manager. The BoT approach implemented with the Llama3-8B has the potential to surpass Llama3-70B. This article explores this novel research in detail.

Table of Contents

  1. Understanding BoT
  2. Components under BoT
  3. Evaluation Benchmarks

Understanding BoT

Implementation of effective prompting methods is a way to enhance the performance of LLMs. We can categorise the current state of prompting into two main categories: Single-query Reasoning and Multi-query Reasoning.

Frameworks like Chain-of-Thought (CoT) and Few-shot Prompting implement single-query reasoning methods, enhancing LLM reasoning through intermediate steps and providing query-relevant examples for response generation.

Multi-query reasoning methods employ the use of query decomposition into a set of sub-questions. LLMs answer these sub-questions and utilize the combined knowledge to address the original query. Graph-of-Thoughts (GoT) and Tree-of-Thoughts (ToT) are some examples of multi-query reasoning frameworks. GoT utilises the idea of an arbitrary graph to model the information generated by an LLM, termed LLM thoughts, and combines these arbitrary LLM thoughts into synergistic outcomes. Tree-of-Thought (ToT) on the other hand, generalises over the CoT framework and allows a language model to perform decision-making by considering multiple reasoning paths.

Comparison between CoT, ToT and GoT Methods

The limitations of single-query and multi-query reasoning processes stem from their reasoning structures and examples. The Buffer of Thoughts (BoT) approach sidesteps these limitations by employing a meta-buffer—a library containing a series of high-level thoughts (thought-template). These high-level thoughts, refined from various problem-solving processes, are shareable across multiple tasks. Each problem utilizes a relevant thought template, along with a specialized reasoning structure, enabling effective thought-augmented reasoning. BoT dynamically updates the meta-buffer using a buffer manager, enhancing it as more problems are solved.

Working of Buffer of Thoughts

BoT improves reasoning accuracy and efficiency using informative historical reasoning structures, without needing to build from scratch. The process from thought-retrieval to thought instantiation is the same as the human thought process, enabling the LLM to address similar problems consistently, significantly improving the model’s robustness and precision. 

BoT Reasoning Process

Components under BoT

Buffer of Thoughts implements three primary components to implement the thought-augmented reasoning process: 

Problem Distiller – During the reasoning phase, LLMs suffer from three primary challenges – extraction of vital information, understanding potential constraints and applying accurate reasoning. BoT uses a problem distiller for extracting problem-specific information along with the relevant constraints. The key elements extracted from input tasks, using a problem distiller, are the parameters and variables for problem-solving, the objectives of the input problem and their corresponding constraints. This information is crucial for decomposing problems and making them easier for the subsequent components to operate. 

Meta Buffer – This is a library containing a series of high-level thoughts in the form of thought templates. The buffer manager obtains these templates from problem-solving process. The system retrieves the most relevant thought template from the meta buffer depending on the problem.

Buffer Manager – Summarises the entire problem-solving process and distils high-level thoughts for increasing the capacity of meta-buffer. It follows a three-step process: identifying and summarising the problem challenges, describing the solution steps, and providing a reusable solution template for similar problems.

Buffer Manager’s Operation Example

Evaluation Benchmarks

BoT achieves significant performance improvements over previous prompting methods across multiple challenging benchmarks such as Game of 24, and Checkmate-in-One. 

BoT Benchmarks 

In terms of inference time, BoT is considerably less than conventional multi-query methods such as ToT.  

Logarithmic Inference Time Comparison

The evaluation of the trade-off between model size and performance with Llama3-8B and Llama3-70B on three challenging benchmarks shows that BoT+Llama3-8B has the potential to surpass the Llama3-70B model. 

The trade-off between model size and performance

The ablation study based on disabling the problem distiller shows an accuracy decline on benchmark problems such as Game of 24. Disabling the meta buffer causes both Llama3-70B and GPT-4 models to show a decline in performance. 

Ablation study of problem distiller

Ablation study of meta buffer

Final Words

Buffer of Thoughts is a valuable approach for LLMs tackling complex tasks that require multi-step reasoning or information retrieval using thought-augmented reasoning. This significantly increases the accuracy, efficiency and robustness of LLMs. The BoT approach demonstrates state-of-the-art performance on ten challenging task benchmarks and offers great value for further research. 


  1. Buffer of Thoughts Research Paper
  2. BoT Git Repo
  3. Chain-of-Thought Prompting Method
  4. Graph-of-Thoughts
  5. Few-shot Learning with Prompting Methods
  6. Reasoning with Language Model Prompting

Learn more about latest Generative AI trends and techniques through our hand-picked modules:

Picture of Sachin Tripathi

Sachin Tripathi

Sachin Tripathi is the Manager of AI Research at AIM, with over a decade of experience in AI and Machine Learning. An expert in generative AI and large language models (LLMs), Sachin excels in education, delivering effective training programs. His expertise also includes programming, big data analytics, and cybersecurity. Known for simplifying complex concepts, Sachin is a leading figure in AI education and professional development.

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Elevate Your Team's AI Skills with our Proven Training Programs

Strengthen Critical AI Skills with Trusted Generative AI Training by Association of Data Scientists.

Our Accreditations

Get global recognition for AI skills

Chartered Data Scientist (CDS™)

The highest distinction in the data science profession. Not just earn a charter, but use it as a designation.

Certified Data Scientist - Associate Level

Global recognition of data science skills at the beginner level.

Certified Generative AI Engineer

An upskilling-linked certification initiative designed to recognize talent in generative AI and large language models

Join thousands of members and receive all benefits.

Become Our Member

We offer both Individual & Institutional Membership.