Counterexample Guided Inductive Synthesis Using Large Language Models and Satisfiability Solving

被引：3

作者：

Jha, Sumit Kumar ^{[1
]}

Jha, Susmit ^{[2
]}

Lincoln, Patrick ^{[2
]}

Bastian, Nathaniel D. ^{[3
]}

Velasquez, Alvaro ^{[4
,5
]}

Ewetz, Rickard

Neema, Sandeep ^{[6
]}

机构：

[1] Florida Int Univ, Dept Comp Sci, Miami, FL 33199 USA

[2] SRI Int, Comp Sci Lab, Menlo Pk, CA USA

[3] US Mil Acad, Army Cyber Inst, West Point, NY USA

[4] Univ Colorado Boulder, Dept Comp Sci, Boulder, CO USA

[5] Univ Cent Florida, Elect & Comp Engn, Orlando, FL USA

[6] Vanderbilt Univ, Elect & Comp Engn, Nashville, TN USA

来源：

MILCOM 2023 - 2023 IEEE MILITARY COMMUNICATIONS CONFERENCE | 2023年

基金：

美国国家科学基金会;

关键词：

D O I：

10.1109/MILCOM58377.2023.10356332

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Generative large language models (LLMs) can follow human-provided instruction prompts and generate human-like responses. Apart from natural language responses, they have been found to be effective at generating formal artifacts such as code, plans, and logical specifications. Despite their remarkably improved accuracy, these models are still known to produce factually incorrect or contextually inappropriate results despite their syntactic coherence - a phenomenon often referred to as hallucinations. This limitation makes it difficult to use these models to synthesize formal artifacts used in safety-critical applications. Unlike tasks such as text summarization and question-answering, bugs in code, plan, and other formal artifacts produced by LLMs can be catastrophic. We posit that we can use the satisfiability modulo theory (SMT) solvers as deductive reasoning engines to analyze the generated solutions from the LLMs, produce counterexamples when the solutions are incorrect, and provide that feedback to the LLMs exploiting the dialog capability of LLMs. This interaction between inductive LLMs and deductive SMT solvers can iteratively steer the LLM to generate the correct response. In our experiments, we use planning over the domain of blocks as our synthesis task for evaluating our approach. We use GPT-4, GPT3.5 Turbo, Davinci, Curie, Babbage, and Ada as the LLMs and Z3 as the SMT solver. Our method allows the user to communicate the planning problem in natural language; even the formulation of queries to SMT solvers is automatically generated from natural language. Thus, the proposed technique can enable non-expert users to describe their problems in natural language, and the combination of LLMs and SMT solvers can produce provably correct solutions.

引用

页数：6

共 50 条

[1] Counterexample guided inductive optimization based on satisfiability modulo theories
Araujo, Rodrigo F.
Albuquerque, Higo E.
de Bessa, Iury V.
Cordeiro, Lucas C.
Chaves Filho, Joao E.
SCIENCE OF COMPUTER PROGRAMMING, 2018, 165 : 3 - 23
[2] Counterexample Guided Inductive Synthesis Modulo Theories
Abate, Alessandro
David, Cristina
Kesseli, Pascal
Kroening, Daniel
Polgreen, Elizabeth
COMPUTER AIDED VERIFICATION (CAV 2018), PT I, 2018, 10981 : 270 - 288
[3] Counterexample-guided inductive synthesis for probabilistic systems
Ceska, Milan
Hensel, Christian
Junges, Sebastian
Katoen, Joost-Pieter
FORMAL ASPECTS OF COMPUTING, 2021, 33 (4-5) : 637 - 667
[4] Counterexample-Guided Synthesis of Perception Models and Control
Ghosh, Shromona
Pant, Yash Vardhan
Ravanbakhsh, Hadi
Seshia, Sanjit A.
2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 3447 - 3454
[5] A Novel Counterexample-Guided Inductive Synthesis Framework for Barrier Certificate Generation
Ding, Mi
Lin, Kaipeng
Lin, Wang
Ding, Zuohua
2022 IEEE 33RD INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE 2022), 2022, : 263 - 273
[6] SAT-Based Counterexample-Guided Inductive Synthesis of Distributed Controllers
Chukharev, Konstantin
Suvorov, Dmitrii
Chivilikhin, Daniil
Vyatkin, Valeriy
IEEE ACCESS, 2020, 8 : 207485 - 207498
[7] Inductive reasoning in humans and large language models
Han, Simon Jerome
Ransom, Keith J.
Perfors, Andrew
Kemp, Charles
COGNITIVE SYSTEMS RESEARCH, 2024, 83
[8] Solving Proof Block Problems Using Large Language Models
Poulsen, Seth
Sarsa, Sami
Prather, James
Leinonen, Juho
Becker, Brett A.
Hellas, Arto
Denny, Paul
Reeves, Brent N.
PROCEEDINGS OF THE 55TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, SIGCSE 2024, VOL. 1, 2024, : 1063 - 1069
[9] Solving satisfiability problems using logic synthesis and reconfigurable hardware
Suyama, T
Yokoo, M
Sawada, H
PROCEEDINGS OF THE THIRTY-FIRST HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, VOL VII: SOFTWARE TECHNOLOGY TRACK, 1998, : 179 - 186
[10] Dehallucinating Large Language Models Using Formal Methods Guided Iterative Prompting
Jha, Susmit
Jha, Sumit Kumar
Lincoln, Patrick
Bastian, Nathaniel D.
Velasquez, Alvaro
Neema, Sandeep
2023 IEEE INTERNATIONAL CONFERENCE ON ASSURED AUTONOMY, ICAA, 2023, : 149 - 152

← 1 2 3 4 5 →