Counterexample Guided Inductive Synthesis Using Large Language Models and Satisfiability Solving

被引:3
|
作者
Jha, Sumit Kumar [1 ]
Jha, Susmit [2 ]
Lincoln, Patrick [2 ]
Bastian, Nathaniel D. [3 ]
Velasquez, Alvaro [4 ,5 ]
Ewetz, Rickard
Neema, Sandeep [6 ]
机构
[1] Florida Int Univ, Dept Comp Sci, Miami, FL 33199 USA
[2] SRI Int, Comp Sci Lab, Menlo Pk, CA USA
[3] US Mil Acad, Army Cyber Inst, West Point, NY USA
[4] Univ Colorado Boulder, Dept Comp Sci, Boulder, CO USA
[5] Univ Cent Florida, Elect & Comp Engn, Orlando, FL USA
[6] Vanderbilt Univ, Elect & Comp Engn, Nashville, TN USA
基金
美国国家科学基金会;
关键词
D O I
10.1109/MILCOM58377.2023.10356332
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Generative large language models (LLMs) can follow human-provided instruction prompts and generate human-like responses. Apart from natural language responses, they have been found to be effective at generating formal artifacts such as code, plans, and logical specifications. Despite their remarkably improved accuracy, these models are still known to produce factually incorrect or contextually inappropriate results despite their syntactic coherence - a phenomenon often referred to as hallucinations. This limitation makes it difficult to use these models to synthesize formal artifacts used in safety-critical applications. Unlike tasks such as text summarization and question-answering, bugs in code, plan, and other formal artifacts produced by LLMs can be catastrophic. We posit that we can use the satisfiability modulo theory (SMT) solvers as deductive reasoning engines to analyze the generated solutions from the LLMs, produce counterexamples when the solutions are incorrect, and provide that feedback to the LLMs exploiting the dialog capability of LLMs. This interaction between inductive LLMs and deductive SMT solvers can iteratively steer the LLM to generate the correct response. In our experiments, we use planning over the domain of blocks as our synthesis task for evaluating our approach. We use GPT-4, GPT3.5 Turbo, Davinci, Curie, Babbage, and Ada as the LLMs and Z3 as the SMT solver. Our method allows the user to communicate the planning problem in natural language; even the formulation of queries to SMT solvers is automatically generated from natural language. Thus, the proposed technique can enable non-expert users to describe their problems in natural language, and the combination of LLMs and SMT solvers can produce provably correct solutions.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Human-like problem-solving abilities in large language models using ChatGPT
    Orru, Graziella
    Piarulli, Andrea
    Conversano, Ciro
    Gemignani, Angelo
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2023, 6
  • [22] PassGPT: Password Modeling and (Guided) Generation with Large Language Models
    Rando, Javier
    Perez-Cruz, Fernando
    Hitaj, Briland
    COMPUTER SECURITY - ESORICS 2023, PT IV, 2024, 14347 : 164 - 183
  • [23] Inductive reasoning with large language models: A simulated randomized controlled trial for epilepsy
    Goldenholz, Daniel M.
    Goldenholz, Shira R.
    Habib, Sara
    Westover, M. Brandon
    EPILEPSY RESEARCH, 2025, 211
  • [24] Using large language models in psychology
    Demszky, Dorottya
    Yang, Diyi
    Yeager, David
    Bryan, Christopher
    Clapper, Margarett
    Chandhok, Susannah
    Eichstaedt, Johannes
    Hecht, Cameron
    Jamieson, Jeremy
    Johnson, Meghann
    Jones, Michaela
    Krettek-Cobb, Danielle
    Lai, Leslie
    Jonesmitchell, Nirel
    Ong, Desmond
    Dweck, Carol
    Gross, James
    Pennebaker, James
    NATURE REVIEWS PSYCHOLOGY, 2023, 2 (11): : 688 - 701
  • [25] Using large language models in psychology
    Dorottya Demszky
    Diyi Yang
    David S. Yeager
    Christopher J. Bryan
    Margarett Clapper
    Susannah Chandhok
    Johannes C. Eichstaedt
    Cameron Hecht
    Jeremy Jamieson
    Meghann Johnson
    Michaela Jones
    Danielle Krettek-Cobb
    Leslie Lai
    Nirel JonesMitchell
    Desmond C. Ong
    Carol S. Dweck
    James J. Gross
    James W. Pennebaker
    Nature Reviews Psychology, 2023, 2 : 688 - 701
  • [26] Using large language models wisely
    不详
    NATURE ASTRONOMY, 2025, 9 (03): : 315 - 315
  • [27] Large Language Models for Inorganic Synthesis Predictions
    Kim, Seongmin
    Jung, Yousung
    Schrier, Joshua
    JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2024, 146 (29) : 19654 - 19659
  • [28] Opinion On Program Synthesis and Large Language Models
    Huttel, Hans
    COMMUNICATIONS OF THE ACM, 2025, 68 (01) : 33 - 35
  • [29] Injecting structural hints: Using language models to study inductive biases in language learning
    Papadimitriou, Isabel
    Jurafsky, Dan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 8402 - 8413
  • [30] MathAttack: Attacking Large Language Models towards Math Solving Ability
    Zhou, Zihao
    Wang, Qiufeng
    Jin, Mingyu
    Yao, Jie
    Ye, Jianan
    Liu, Wei
    Wang, Wei
    Huang, Xiaowei
    Huang, Kaizhu
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19750 - 19758