The unreasonable effectiveness of large language models in zero-shot semantic annotation of legal texts

被引:11
|
作者
Savelka, Jaromir [1 ]
Ashley, Kevin D. [2 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
[2] Univ Pittsburgh, Sch Law, Pittsburgh, PA 15260 USA
来源
关键词
legal text analytics; large language models (LLM); zero-shot classification; semantic annotation; text annotation; CLASSIFICATION; EXTRACTION; DECISIONS; SEARCH;
D O I
10.3389/frai.2023.1279794
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The emergence of ChatGPT has sensitized the general public, including the legal profession, to large language models' (LLMs) potential uses (e.g., document drafting, question answering, and summarization). Although recent studies have shown how well the technology performs in diverse semantic annotation tasks focused on legal texts, an influx of newer, more capable (GPT-4) or cost-effective (GPT-3.5-turbo) models requires another analysis. This paper addresses recent developments in the ability of LLMs to semantically annotate legal texts in zero-shot learning settings. Given the transition to mature generative AI systems, we examine the performance of GPT-4 and GPT-3.5-turbo(-16k), comparing it to the previous generation of GPT models, on three legal text annotation tasks involving diverse documents such as adjudicatory opinions, contractual clauses, or statutory provisions. We also compare the models' performance and cost to better understand the trade-offs. We found that the GPT-4 model clearly outperforms the GPT-3.5 models on two of the three tasks. The cost-effective GPT-3.5-turbo matches the performance of the 20x more expensive text-davinci-003 model. While one can annotate multiple data points within a single prompt, the performance degrades as the size of the batch increases. This work provides valuable information relevant for many practical applications (e.g., in contract review) and research projects (e.g., in empirical legal studies). Legal scholars and practicing lawyers alike can leverage these findings to guide their decisions in integrating LLMs in a wide range of workflows involving semantic annotation of legal texts.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Improving Zero-Shot Text Matching for Financial Auditing with Large Language Models
    Hillebrand, Lars
    Berger, Armin
    Deusser, Tobias
    Dilmaghani, Tim
    Khaled, Mohamed
    Kliem, Bernd
    Loitz, Ruediger
    Pielka, Maren
    Leonhard, David
    Bauckhage, Christian
    Sifa, Rafet
    PROCEEDINGS OF THE 2023 ACM SYMPOSIUM ON DOCUMENT ENGINEERING, DOCENG 2023, 2023,
  • [22] Zero-shot interpretable phenotyping of postpartum hemorrhage using large language models
    Emily Alsentzer
    Matthew J. Rasmussen
    Romy Fontoura
    Alexis L. Cull
    Brett Beaulieu-Jones
    Kathryn J. Gray
    David W. Bates
    Vesela P. Kovacheva
    npj Digital Medicine, 6
  • [23] Combining Small Language Models and Large Language Models for Zero-Shot NL2SQL
    Fan, Ju
    Gu, Zihui
    Zhang, Songyue
    Zhang, Yuxin
    Chen, Zui
    Cao, Lei
    Li, Guoliang
    Madden, Samuel
    Du, Xiaoyong
    Tang, Nan
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (11): : 2750 - 2763
  • [24] Effectiveness of large language models in automated evaluation of argumentative essays: finetuning vs. zero-shot prompting
    Wang, Qiao
    Gayed, John Maurice
    COMPUTER ASSISTED LANGUAGE LEARNING, 2024,
  • [25] Large Language Models Are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models
    Deng, Yinlin
    Xia, Chunqiu Steven
    Peng, Haoran
    Yang, Chenyuan
    Zhan, Lingming
    PROCEEDINGS OF THE 32ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2023, 2023, : 423 - 435
  • [26] Enhancing text-based knowledge graph completion with zero-shot large language models: A focus on semantic enhancement
    Yang, Rui
    Zhu, Jiahao
    Man, Jianping
    Fang, Li
    Zhou, Yi
    KNOWLEDGE-BASED SYSTEMS, 2024, 300
  • [27] Harnessing large language models' zero-shot and few-shot learning capabilities for regulatory research
    Meshkin, Hamed
    Zirkle, Joel
    Arabidarrehdor, Ghazal
    Chaturbedi, Anik
    Chakravartula, Shilpa
    Mann, John
    Thrasher, Bradlee
    Li, Zhihua
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (05)
  • [29] Zero-Shot ECG Diagnosis with Large Language Models and Retrieval-Augmented Generation
    Yu, Han
    Guo, Peikun
    Sano, Akane
    MACHINE LEARNING FOR HEALTH, ML4H, VOL 225, 2023, 225 : 650 - 663
  • [30] Aligning Instruction Tasks Unlocks Large Language Models as Zero-Shot Relation Extractors
    Zhang, Kai
    Gutierrez, Bernal Jimenez
    Su, Yu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 794 - 812