LeanDojo: Theorem Proving with Retrieval-Augmented Language Models

被引:0
|
作者
Yang, Kaiyu [1 ]
Swope, Aidan M. [2 ]
Gu, Alex
Chalamala, Rahul [1 ]
Song, Peiyang [3 ]
Yu, Shixing [4 ]
Godil, Saad
Prenger, Ryan [2 ]
Anandkumar, Anima [1 ,2 ]
机构
[1] CALTECH, Pasadena, CA 91125 USA
[2] NVIDIA, Santa Clara, CA USA
[3] UC Santa Barbara, Santa Barbara, CA USA
[4] UT Austin, Austin, TX USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) have shown promise in proving formal theorems using proof assistants such as Lean. However, existing methods are difficult to reproduce or build on, due to private code, data, and large compute requirements. This has created substantial barriers to research on machine learning methods for theorem proving. This paper removes these barriers by introducing LeanDojo: an open-source Lean playground consisting of toolkits, data, models, and benchmarks. LeanDojo extracts data from Lean and enables interaction with the proof environment programmatically. It contains fine-grained annotations of premises in proofs, providing valuable data for premise selection-a key bottleneck in theorem proving. Using this data, we develop ReProver (Retrieval-Augmented Prover): an LLM-based prover augmented with retrieval for selecting premises from a vast math library. It is inexpensive and needs only one GPU week of training. Our retriever leverages LeanDojo's program analysis capability to identify accessible premises and hard negative examples, which makes retrieval much more effective. Furthermore, we construct a new benchmark consisting of 98,734 theorems and proofs extracted from Lean's math library. It features challenging data split requiring the prover to generalize to theorems relying on novel premises that are never used in training. We use this benchmark for training and evaluation, and experimental results demonstrate the effectiveness of ReProver over non-retrieval baselines and GPT-4. We thus provide the first set of open-source LLM-based theorem provers without any proprietary datasets and release it under a permissive MIT license to facilitate further research.
引用
收藏
页数:40
相关论文
共 50 条
  • [1] In-Context Retrieval-Augmented Language Models
    Ram, Ori
    Levine, Yoav
    Dalmedigos, Itay
    Muhlgay, Dor
    Shashua, Amnon
    Leyton-Brown, Kevin
    Shoham, Yoav
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 1316 - 1331
  • [2] Query Rewriting for Retrieval-Augmented Large Language Models
    Ma, Xinbei
    Gong, Yeyun
    He, Pengcheng
    Zhao, Hai
    Duan, Nan
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5303 - 5315
  • [3] Benchmarking Large Language Models in Retrieval-Augmented Generation
    Chen, Jiawei
    Lin, Hongyu
    Han, Xianpei
    Sun, Le
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17754 - 17762
  • [4] Retrieval-Augmented Diffusion Models
    Blattmann, Andreas
    Rombach, Robin
    Oktay, Kaan
    Mueller, Jonas
    Ommer, Bjoern
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [5] Hallucination Mitigation for Retrieval-Augmented Large Language Models: A Review
    Zhang, Wan
    Zhang, Jing
    MATHEMATICS, 2025, 13 (05)
  • [6] Resolving Unseen Rumors with Retrieval-Augmented Large Language Models
    Chen, Lei
    Wei, Zhongyu
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT IV, NLPCC 2024, 2025, 15362 : 319 - 332
  • [7] Surface-Based Retrieval Reduces Perplexity of Retrieval-Augmented Language Models
    Doostmohammadi, Ehsan
    Norlund, Tobias
    Kuhlmann, Marco
    Johansson, Richard
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 521 - 529
  • [8] Retrieval-augmented large language models for clinical trial screening.
    He, Jianqiao
    Gai, Shanglei
    Ho, Si Xian
    Chua, Shi Ling
    Oo, Viviana
    Zaw, Ma Wai Wai
    Tan, Daniel Shao-Weng
    Tan, Ryan
    JOURNAL OF CLINICAL ONCOLOGY, 2024, 42 (23_SUPPL) : 157 - 157
  • [9] Retrieval-augmented large language models for clinical trial screening.
    Tan, Ryan
    Ho, Si Xian
    Oo, Shiyun Vivianna Fequira
    Chua, Shi Ling
    Zaw, Ma Wai Wai
    Tan, Daniel Shao-Weng
    JOURNAL OF CLINICAL ONCOLOGY, 2024, 42 (16)
  • [10] Utilizing Retrieval-Augmented Large Language Models for Pregnancy Nutrition Advice
    Bano, Taranum
    Vadapalli, Jagadeesh
    Karki, Bishwa
    Thoene, Melissa K.
    VanOrmer, Matt
    Berry, Ann L. Anderson
    Tsai, Chun-Hua
    NEW TRENDS IN DISRUPTIVE TECHNOLOGIES, TECH ETHICS, AND ARTIFICIAL INTELLIGENCE, DITTET 2024, 2024, 1459 : 85 - 96