The unreasonable effectiveness of large language models in zero-shot semantic annotation of legal texts

被引:11
|
作者
Savelka, Jaromir [1 ]
Ashley, Kevin D. [2 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
[2] Univ Pittsburgh, Sch Law, Pittsburgh, PA 15260 USA
来源
关键词
legal text analytics; large language models (LLM); zero-shot classification; semantic annotation; text annotation; CLASSIFICATION; EXTRACTION; DECISIONS; SEARCH;
D O I
10.3389/frai.2023.1279794
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The emergence of ChatGPT has sensitized the general public, including the legal profession, to large language models' (LLMs) potential uses (e.g., document drafting, question answering, and summarization). Although recent studies have shown how well the technology performs in diverse semantic annotation tasks focused on legal texts, an influx of newer, more capable (GPT-4) or cost-effective (GPT-3.5-turbo) models requires another analysis. This paper addresses recent developments in the ability of LLMs to semantically annotate legal texts in zero-shot learning settings. Given the transition to mature generative AI systems, we examine the performance of GPT-4 and GPT-3.5-turbo(-16k), comparing it to the previous generation of GPT models, on three legal text annotation tasks involving diverse documents such as adjudicatory opinions, contractual clauses, or statutory provisions. We also compare the models' performance and cost to better understand the trade-offs. We found that the GPT-4 model clearly outperforms the GPT-3.5 models on two of the three tasks. The cost-effective GPT-3.5-turbo matches the performance of the 20x more expensive text-davinci-003 model. While one can annotate multiple data points within a single prompt, the performance degrades as the size of the batch increases. This work provides valuable information relevant for many practical applications (e.g., in contract review) and research projects (e.g., in empirical legal studies). Legal scholars and practicing lawyers alike can leverage these findings to guide their decisions in integrating LLMs in a wide range of workflows involving semantic annotation of legal texts.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Unlocking Practical Applications in Legal Domain: Evaluation of GPT for Zero-Shot Semantic Annotation of Legal Texts
    Savelka, Jaromir
    PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND LAW, ICAIL 2023, 2023, : 447 - 451
  • [2] Large Language Models are Zero-Shot Reasoners
    Kojima, Takeshi
    Gu, Shixiang Shane
    Reid, Machel
    Matsuo, Yutaka
    Iwasawa, Yusuke
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [3] Large Language Models as Zero-Shot Conversational Recommenders
    He, Zhankui
    Xie, Zhouhang
    Jha, Rahul
    Steck, Harald
    Liang, Dawen
    Feng, Yesu
    Majumder, Bodhisattwa Prasad
    Kallus, Nathan
    McAuley, Julian
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 720 - 730
  • [4] Zero-Shot Classification of Art With Large Language Models
    Tojima, Tatsuya
    Yoshida, Mitsuo
    IEEE ACCESS, 2025, 13 : 17426 - 17439
  • [5] ZEROTOP: Zero-Shot Task-Oriented Semantic Parsing using Large Language Models
    Mekala, Dheeraj
    Wolfe, Jason
    Roy, Subhro
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5792 - 5799
  • [6] Large Language Models are Zero-Shot Rankers for Recommender Systems
    Hou, Yupeng
    Zhang, Junjie
    Lin, Zihan
    Lu, Hongyu
    Xie, Ruobing
    McAuley, Julian
    Zhao, Wayne Xin
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT II, 2024, 14609 : 364 - 381
  • [7] Large Language Models Are Zero-Shot Time Series Forecasters
    Gruver, Nate
    Finzi, Marc
    Qiu, Shikai
    Wilson, Andrew Gordon
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [8] Examining Zero-Shot Vulnerability Repair with Large Language Models
    Pearce, Hammond
    Tan, Benjamin
    Ahmad, Baleegh
    Karri, Ramesh
    Dolan-Gavitt, Brendan
    2023 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP, 2023, : 2339 - 2356
  • [9] Examining Zero-Shot Vulnerability Repair with Large Language Models
    Pearce, Hammond
    Tan, Benjamin
    Ahmad, Baleegh
    Karri, Ramesh
    Dolan-Gavitt, Brendan
    2023 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP, 2023, : 2339 - 2356
  • [10] Revisiting Large Language Models as Zero-shot Relation Extractors
    Li, Guozheng
    Wang, Peng
    Ke, Wenjun
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 6877 - 6892