The rise of large language models (LLMs) has sparked a revolution across numerous artificial intelligence domains, with mathematical problem-solving being a prominent area of advancement. However, applying them to formal theorem proving is still a big challenge. Formal proof assistants need strict logical steps, unlike the informal reasoning of LLMs that relies on estimations and data patterns. This article explores DeepSeek-Prover-V2, a new LLM designed to bridge this gap by using a method that breaks down complex tasks into a hierarchy of simpler, independently solvable subtasks, thus progressively achieving the overarching objective of proving a theorem.
Table of Content
- What is DeepSeek-Prover-V2?
- Architecture and Methodology
- Key Features and Innovations
- Experimental Results and Benchmarks
- Technical Deep Dive
Let’s start by understanding what DeepSeek-Prover-V2 is.
What is DeepSeek-Prover-V2?
DeepSeek-Prover-V2 is an open-source large language model specifically designed for formal theorem proving within the Lean 4 environment. It distinguishes itself by employing a unique training paradigm that leverages initialization data collected through a recursive theorem-proving pipeline, powered by DeepSeek-V3. This innovative approach enables it to effectively integrate both informal and formal mathematical reasoning into a unified model. The resulting model, DeepSeek-Prover-V2-671B, has achieved state-of-the-art performance in neural theorem proving across a range of challenging benchmarks.
Architecture and Methodology
The architecture of DeepSeek-Prover-V2 is centered around a recursive proof search strategy that involves subgoal decomposition. The methodology leverages DeepSeek-V3 to decompose complex theorems into a sequence of smaller, more manageable lemmas, effectively mirroring the problem-solving strategies employed by human mathematicians.
This decomposition process involves prompting DeepSeek-V3 to generate high-level proof sketches in natural language while simultaneously formalizing these proof steps in Lean 4, resulting in a structured sequence of subgoals. To manage the computational demands of proof search, a smaller 7B model is employed to handle the proof search for each decomposed subgoal.
Overview of the cold-start data collection process employed by DeepSeek-Prover-V2
An example of how decomposed subgoals are translated into a series of lemma statements.
Furthermore, DeepSeek-Prover-V2 incorporates a curriculum learning framework that leverages the decomposed subgoals to generate conjectural theorems, progressively increasing the difficulty of training tasks to guide the model’s learning process. Once the decomposed steps of a challenging problem are resolved, the complete step-by-step formal proof is paired with the corresponding chain-of-thought from DeepSeek-V3 to create high-quality cold-start reasoning data. This cold-start data then serves as the foundation for a reinforcement learning stage, further enhancing the model’s formal theorem proving capabilities.
Key Features and Innovations
DeepSeek-Prover-V2 introduces several key features and innovations that contribute to its state-of-the-art performance:
Recursive Proof Search via Subgoal Decomposition
This core feature enables the model to break down complex theorems into a hierarchy of simpler subgoals, significantly enhancing proof search efficiency.
Curriculum Learning for Subgoal-based Theorem Proving
This framework leverages decomposed subgoals to generate training tasks of increasing difficulty, providing denser training signals and more effective learning.
Unifying Informal Reasoning and Proof Formalization
It effectively unifies the capabilities of informal mathematical reasoning and formal proof construction within a single model, leveraging DeepSeek-V3 for lemma decomposition and a 7B prover model for completing formal proof details.
Two-Stage Training Pipeline
The model is trained through a two-stage pipeline that establishes two complementary proof generation modes: a high-efficiency non-Chain-of-Thought (non-CoT) mode for rapid generation of formal Lean proof codes and a high-precision Chain-of-Thought (CoT) mode for articulating intermediate reasoning steps.
Experimental Results and Benchmarks
DeepSeek-Prover-V2 has been rigorously evaluated across diverse benchmark datasets of formal theorem proving, demonstrating its exceptional performance in both high-school competition problems and undergraduate-level mathematics.
MiniF2F Benchmark: On the MiniF2F-test dataset, It achieved a new state-of-the-art performance, reaching 82.4% accuracy with only 32 samples when using the CoT generation strategy and further improving to 88.9% with Pass@8192.
ProofNet and PutnamBench: The model also demonstrated strong generalization capabilities on undergraduate-level benchmarks, achieving significant improvements in pass rates on ProofNet-test and solving a substantial number of challenging problems from PutnamBench.
ProverBench: The authors introduced ProverBench, a new benchmark dataset containing 325 formalized problems, including 15 from the prestigious AIME competitions. DeepSeek-Prover-V2-671B achieved impressive results on this benchmark, further highlighting its advanced mathematical reasoning capabilities.
Technical Deep Dive
A critical aspect of DeepSeek-Prover-V2’s methodology is the cold-start data collection process. This process involves a recursive theorem-proving framework, utilizing DeepSeek-V3 as a unified tool for both subgoal decomposition and lemma formalization within the Lean 4 proof assistant.
DeepSeek-V3 is prompted to generate natural-language proof sketches while simultaneously formalizing them into Lean statements with “sorry” placeholders for omitted proof details. A 7B prover model then recursively solves the decomposed subgoals, and by combining these subgoal proofs, a complete formal proof for the original complex problem is constructed.
This composed proof is appended to DeepSeek-V3’s original chain-of-thought, creating high-quality cold-start training data for formal mathematical reasoning. This approach enables the collection of hundreds of high-quality synthetic cold-start data, which serves as the foundation for training DeepSeek-Prover-V2.
Final Words
DeepSeek-Prover-V2 represents a significant step forward in the quest to bridge the gap between informal mathematical reasoning and formal theorem proving. Its innovative approach to data construction, recursive proof search, and curriculum learning has yielded state-of-the-art results across a range of challenging benchmarks.
References
- DeepSeek-Prover-V2 Paper
- Original Paper Github: https://github.com/deepseek-ai/DeepSeek-Prover-V2
- DeepSeek-V3 Technical Report: https://arxiv.org/abs/2412.19437