LeanDojo: Theorem Proving with Retrieval-Augmented Language Models

被引:0
|
作者
Yang, Kaiyu [1 ]
Swope, Aidan M. [2 ]
Gu, Alex
Chalamala, Rahul [1 ]
Song, Peiyang [3 ]
Yu, Shixing [4 ]
Godil, Saad
Prenger, Ryan [2 ]
Anandkumar, Anima [1 ,2 ]
机构
[1] CALTECH, Pasadena, CA 91125 USA
[2] NVIDIA, Santa Clara, CA USA
[3] UC Santa Barbara, Santa Barbara, CA USA
[4] UT Austin, Austin, TX USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) have shown promise in proving formal theorems using proof assistants such as Lean. However, existing methods are difficult to reproduce or build on, due to private code, data, and large compute requirements. This has created substantial barriers to research on machine learning methods for theorem proving. This paper removes these barriers by introducing LeanDojo: an open-source Lean playground consisting of toolkits, data, models, and benchmarks. LeanDojo extracts data from Lean and enables interaction with the proof environment programmatically. It contains fine-grained annotations of premises in proofs, providing valuable data for premise selection-a key bottleneck in theorem proving. Using this data, we develop ReProver (Retrieval-Augmented Prover): an LLM-based prover augmented with retrieval for selecting premises from a vast math library. It is inexpensive and needs only one GPU week of training. Our retriever leverages LeanDojo's program analysis capability to identify accessible premises and hard negative examples, which makes retrieval much more effective. Furthermore, we construct a new benchmark consisting of 98,734 theorems and proofs extracted from Lean's math library. It features challenging data split requiring the prover to generalize to theorems relying on novel premises that are never used in training. We use this benchmark for training and evaluation, and experimental results demonstrate the effectiveness of ReProver over non-retrieval baselines and GPT-4. We thus provide the first set of open-source LLM-based theorem provers without any proprietary datasets and release it under a permissive MIT license to facilitate further research.
引用
收藏
页数:40
相关论文
共 50 条
  • [31] Learning Customized Visual Models with Retrieval-Augmented Knowledge
    Liu, Haotian
    Son, Kilho
    Yang, Jianwei
    Liu, Ce
    Gao, Jianfeng
    Lee, Yong Jae
    Li, Chunyuan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15148 - 15158
  • [32] REALM: Retrieval-Augmented Language Model Pre-Training
    Guu, Kelvin
    Lee, Kenton
    Tung, Zora
    Pasupat, Panupong
    Chang, Ming-Wei
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [33] An Interactive Multi-modal Query Answering System with Retrieval-Augmented Large Language Models
    Wang, Mengzhao
    Wu, Haotian
    Ke, Xiangyu
    Gao, Yunjun
    Xu, Xiaoliang
    Chen, Lu
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (12): : 4333 - 4336
  • [34] Quantitative Evaluation of Using Large Language Models and Retrieval-Augmented Generation in Computer Science Education
    Wang, Kevin Shukang
    Lawrence, Ramon
    PROCEEDINGS OF THE 56TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, SIGCSE TS 2025, VOL 2, 2025, : 1183 - 1189
  • [35] Layered Query Retrieval: An Adaptive Framework for Retrieval-Augmented Generation in Complex Question Answering for Large Language Models
    Huang, Jie
    Wang, Mo
    Cui, Yunpeng
    Liu, Juan
    Chen, Li
    Wang, Ting
    Li, Huan
    Wu, Jinming
    APPLIED SCIENCES-BASEL, 2024, 14 (23):
  • [36] Interpretable Long-Form Legal Question Answering with Retrieval-Augmented Large Language Models
    Louis, Antoine
    van Dijck, Gijs
    Spanakis, Gerasimos
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 20, 2024, : 22266 - 22275
  • [37] Facilitating university admission using a chatbot based on large language models with retrieval-augmented generation
    Chen, Zheng
    Zou, Di
    Xie, Haoran
    Lou, Huajie
    Pang, Zhiyuan
    EDUCATIONAL TECHNOLOGY & SOCIETY, 2024, 27 (04): : 454 - 470
  • [38] Optimized interaction with Large Language Models: A practical guide to Prompt Engineering and Retrieval-Augmented Generation
    Fink, Anna
    Rau, Alexander
    Kotter, Elmar
    Bamberg, Fabian
    Russe, Maximilian Frederik
    RADIOLOGIE, 2025,
  • [39] Quantitative Evaluation of Using Large Language Models and Retrieval-Augmented Generation in Computer Science Education
    Wang, Kevin Shukang
    Lawrence, Ramon
    PROCEEDINGS OF THE 56TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, SIGCSE TS 2025, VOL 1, 2025, : 1183 - 1189
  • [40] CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models
    Lyu, Yuanjie
    Li, Zhiyu
    Niu, Simin
    Xiong, Feiyu
    Tang, Bo
    Wang, Wenjin
    Wu, Hao
    Liu, Huanyong
    Xu, Tong
    Chen, Enhong
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2025, 43 (02)