The unreasonable effectiveness of large language models in zero-shot semantic annotation of legal texts

被引:11
|
作者
Savelka, Jaromir [1 ]
Ashley, Kevin D. [2 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
[2] Univ Pittsburgh, Sch Law, Pittsburgh, PA 15260 USA
来源
关键词
legal text analytics; large language models (LLM); zero-shot classification; semantic annotation; text annotation; CLASSIFICATION; EXTRACTION; DECISIONS; SEARCH;
D O I
10.3389/frai.2023.1279794
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The emergence of ChatGPT has sensitized the general public, including the legal profession, to large language models' (LLMs) potential uses (e.g., document drafting, question answering, and summarization). Although recent studies have shown how well the technology performs in diverse semantic annotation tasks focused on legal texts, an influx of newer, more capable (GPT-4) or cost-effective (GPT-3.5-turbo) models requires another analysis. This paper addresses recent developments in the ability of LLMs to semantically annotate legal texts in zero-shot learning settings. Given the transition to mature generative AI systems, we examine the performance of GPT-4 and GPT-3.5-turbo(-16k), comparing it to the previous generation of GPT models, on three legal text annotation tasks involving diverse documents such as adjudicatory opinions, contractual clauses, or statutory provisions. We also compare the models' performance and cost to better understand the trade-offs. We found that the GPT-4 model clearly outperforms the GPT-3.5 models on two of the three tasks. The cost-effective GPT-3.5-turbo matches the performance of the 20x more expensive text-davinci-003 model. While one can annotate multiple data points within a single prompt, the performance degrades as the size of the batch increases. This work provides valuable information relevant for many practical applications (e.g., in contract review) and research projects (e.g., in empirical legal studies). Legal scholars and practicing lawyers alike can leverage these findings to guide their decisions in integrating LLMs in a wide range of workflows involving semantic annotation of legal texts.
引用
收藏
页数:14
相关论文
共 50 条
  • [11] Semantic Parsing by Large Language Models for Intricate Updating Strategies of Zero-Shot Dialogue State Tracking
    Wu, Yuxiang
    Dong, Guanting
    Xu, Weiran
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11093 - 11099
  • [12] MEDAGENTS: Large Language Models as Collaborators for Zero-shot Medical Reasoning
    Tang, Xiangru
    Zou, Anni
    Zhang, Zhuosheng
    Li, Ziming
    Zhao, Yilun
    Zhang, Xingyao
    Cohen, Arman
    Gerstein, Mark
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 599 - 621
  • [13] Zero-shot semantic parser for spoken language understanding
    Ferreira, Emmanuel
    Jabaian, Bassam
    Lefevre, Fabrice
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1403 - 1407
  • [14] Zero-shot Bilingual App Reviews Mining with Large Language Models
    Wei, Jialiang
    Courbis, Anne-Lise
    Lambolais, Thomas
    Xu, Binbin
    Bernard, Pierre Louis
    Dray, Gerard
    2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 898 - 904
  • [15] Language Models as Zero-Shot Trajectory Generators
    Kwon, Teyun
    Di Palo, Norman
    Johns, Edward
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (07): : 6728 - 6735
  • [16] Large Language Models as Zero-Shot Human Models for Human-Robot Interaction
    Zhang, Bowen
    Soh, Harold
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 7961 - 7968
  • [17] Zero-shot interpretable phenotyping of postpartum hemorrhage using large language models
    Alsentzer, Emily
    Rasmussen, Matthew J.
    Fontoura, Romy
    Cull, Alexis L.
    Beaulieu-Jones, Brett
    Gray, Kathryn J.
    Bates, David W.
    Kovacheva, Vesela P.
    NPJ DIGITAL MEDICINE, 2023, 6 (01)
  • [18] PqE: Zero-Shot Document Expansion for Dense Retrieval with Large Language Models
    Liu, Jiyuan
    Zou, Dongsheng
    Chai, Naiquan
    Yang, Yuming
    Wang, Hao
    Song, Xinyi
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT I, NLPCC 2024, 2025, 15359 : 97 - 109
  • [19] Zero-Shot Generative Large Language Models for Systematic Review Screening Automation
    Wang, Shuai
    Scells, Harrisen
    Zhuang, Shengyao
    Potthast, Martin
    Koopman, Bevan
    Zuccon, Guido
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I, 2024, 14608 : 403 - 420
  • [20] Zero-Shot Semantic Segmentation
    Bucher, Maxime
    Vu, Tuan-Hung
    Cord, Matthieu
    Perez, Patrick
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32