PROSPER: Extracting Protocol Specifications Using Large Language Models

被引:3
|
作者
Sharma, Prakhar [1 ]
Yegneswaran, Vinod [1 ]
机构
[1] SRI Int, Menlo Pk, CA 94025 USA
关键词
Large language models; request for comments; protocol specifications; protocol FSMs; automated extraction;
D O I
10.1145/3626111.3628205
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We explore the application of Large Language Models (LLMs) (specifically GPT-3.5-turbo) to extract specifications and automating understanding of networking protocols from Internet Request for Comments (RFC) documents. LLMs have proven successful in specialized domains like medical and legal text understanding, and this work investigates their potential in automatically comprehending RFCs. We develop Artifact Miner, a tool to extract diagram artifacts from RFCs. We then couple extracted artifacts with natural language text to extract protocol automata using GPT-turbo 3.5 (chatGPT) and present our zero-shot and few-shot extraction results. We call this framework for FSM extraction 'PROSPER: Protocol Specification Miner'. We compare PROSPER with existing state-of-the-art techniques for protocol FSM state and transition extraction. Our experiments indicate that employing artifacts along with text for extraction can lead to lower false positives and better accuracy for both extracted states and transitions. Finally, we discuss efficient prompt engineering techniques, the errors we encountered, and pitfalls of using LLMs for knowledge extraction from specialized domains such as RFC documents.
引用
收藏
页码:41 / 47
页数:7
相关论文
共 50 条
  • [1] Extracting goal models from natural language requirement specifications
    Das, Souvick
    Deb, Novarun
    Cortesi, Agostino
    Chaki, Nabendu
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2024, 211
  • [2] Extracting Software Product Line Feature Models from Natural Language Specifications
    Sree-Kumar, Anjali
    Planas, Elena
    Clariso, Robert
    [J]. SPLC'18: PROCEEDINGS OF THE 22ND INTERNATIONAL SYSTEMS AND SOFTWARE PRODUCT LINE CONFERENCE, VOL 1, 2018, : 43 - 53
  • [3] Extracting Training Data from Large Language Models
    Carlini, Nicholas
    Tramer, Florian
    Wallace, Eric
    Jagielski, Matthew
    Herbert-Voss, Ariel
    Lee, Katherine
    Roberts, Adam
    Brown, Tom
    Song, Dawn
    Erlingsson, Ulfar
    Oprea, Alina
    Raffel, Colin
    [J]. PROCEEDINGS OF THE 30TH USENIX SECURITY SYMPOSIUM, 2021, : 2633 - 2650
  • [4] Using Large Pretrained Language Models for Answering User Queries from Product Specifications
    Roy, Kalyani
    Shah, Smit
    Pai, Nithish
    Ramtej, Jaidam
    Nadkarn, Prajit Prashant
    Banerjee, Jyotirmoy
    Goyal, Pawan
    Kumar, Surender
    [J]. WORKSHOP ON E-COMMERCE AND NLP (ECNLP 3), 2020, : 35 - 39
  • [5] SpecNFS: A Challenge Dataset Towards Extracting Formal Models from Natural Language Specifications
    Ghosh, Sayontan
    Singh, Amanpreet
    Merenstein, Alex
    Su, Wei
    Smolka, Scott A.
    Zadok, Erez
    Balasubramanian, Niranjan
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 2166 - 2176
  • [6] Identifying and Extracting Rare Diseases and Their Phenotypes with Large Language Models
    Shyr, Cathy
    Hu, Yan
    Bastarache, Lisa
    Cheng, Alex
    Hamid, Rizwan
    Harris, Paul
    Xu, Hua
    [J]. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH, 2024, 8 (02) : 438 - 461
  • [7] Identifying and Extracting Rare Diseases and Their Phenotypes with Large Language Models
    Cathy Shyr
    Yan Hu
    Lisa Bastarache
    Alex Cheng
    Rizwan Hamid
    Paul Harris
    Hua Xu
    [J]. Journal of Healthcare Informatics Research, 2024, 8 : 438 - 461
  • [8] Generating Specifications from Requirements Documents for Smart Devices Using Large Language Models (LLMs)
    Lutze, Rainer
    Waldhoer, Klemens
    [J]. HUMAN-COMPUTER INTERACTION, PT I, HCI 2024, 2024, 14684 : 94 - 108
  • [9] Extracting Domain Models from Textual Requirements in the Era of Large Language Models
    Arulmohan, Sathurshan
    Meurs, Marie-Jean
    Mosser, Sebastien
    [J]. 2023 ACM/IEEE INTERNATIONAL CONFERENCE ON MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS COMPANION, MODELS-C, 2023, : 580 - 587
  • [10] Extracting Design Information from Natural Language Specifications
    Harris, Ian G.
    [J]. 2012 49TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2012, : 1252 - 1253