Semantic role extraction in law texts: a comparative analysis of language models for legal information extraction

被引:0
|
作者
Bakker, Roos M. [1 ,2 ]
Schoevers, Akke J. [1 ,3 ]
van Drie, Romy A. N. [1 ]
Schraagen, Marijn P. [3 ]
de Boer, Maaike H. T. [1 ]
机构
[1] TNO, Dept Data Sci, The Hague, Netherlands
[2] Leiden Univ, Ctr Linguist, Leiden, Netherlands
[3] Univ Utrecht, Nat Language Proc, Utrecht, Netherlands
关键词
Semantic role labelling; Large language models; Legislation; Legal information extraction; Legal semantic roles;
D O I
10.1007/s10506-025-09437-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Norms are essential in our society: they dictate how individuals should behave and interact within a community. They can be written down in laws or other written sources. Interpretations often differ; this is where formalisations offer a solution. They express an interpretation of a source of norms in a transparent manner. However, creating these interpretations is labour intensive. Natural language processing techniques can support this process. Previous work showed the potential of transformer-based models for Dutch law texts. In this paper, we (1) introduce a dataset of 2335 English sentences annotated with legal semantic roles conform the Flint framework; (2) fine-tune a collection of language models on this dataset, and (3) query two non-fine-tuned generative large language models (LLMs). This allows us to compare performance of fine-tuned domain-specific, task-specific, and general language models with non-fine-tuned generative LLMs. The results show that models fine-tuned on our dataset have the best performance (accuracy around 0.88). Furthermore, domain-specific models perform better than general models, indicating that domain knowledge is of added value for this task. Finally, different methods of querying LLMs perform unsatisfactorily, with maximum accuracy scores around 0.6. This indicates that for specific tasks, such as this adaptation of semantic role labelling, the process of annotating data and fine-tuning a smaller language model is preferred over querying a generative LLM, especially when domain-specific models are available.
引用
收藏
页数:35
相关论文
共 50 条
  • [21] Large language models for generative information extraction: a survey
    Xu, Derong
    Chen, Wei
    Peng, Wenjun
    Zhang, Chao
    Xu, Tong
    Zhao, Xiangyu
    Wu, Xian
    Zheng, Yefeng
    Wang, Yang
    Chen, Enhong
    FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (06)
  • [22] Extraction of Subjective Information from Large Language Models
    Kobayashi, Atsuya
    Yamaguchi, Saneyasu
    2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024, 2024, : 1612 - 1617
  • [23] AN EVENT EXTRACTION METHOD BY SEMANTIC ROLE ANALYSIS
    Zhang Shun-rui
    Xu Yu-qing
    Zhou Xin-jian
    Yue Hui
    Zhu Xiao-wen
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), 2016, : 591 - 595
  • [24] Improving Information Extraction from Images with Learned Semantic Models
    Baier, Stephan
    Ma, Yunpu
    Tresp, Volker
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 5214 - 5218
  • [25] Enhancing Relation Extraction from Biomedical Texts by Large Language Models
    Asada, Masaki
    Fukuda, Ken
    ARTIFICIAL INTELLIGENCE IN HCI, PT III, AI-HCI 2024, 2024, 14736 : 3 - 14
  • [26] Information extraction for the semantic web
    Baumgartner, R
    Eiter, T
    Gottlob, G
    Herzog, M
    Koch, C
    REASONING WEB, 2005, 3564 : 275 - 289
  • [27] Extraction and Analysis of Semantic Features of English Texts under Intelligent Algorithms
    Automatic Control and Computer Sciences, 2024, 58 : 109 - 115
  • [28] Keyword extraction from Arabic legal texts
    Rammal, Mahmoud
    Bahsoun, Zeinab
    Jabbour, Mona Al Achkar
    INTERACTIVE TECHNOLOGY AND SMART EDUCATION, 2015, 12 (01) : 62 - 71
  • [29] Extraction and Analysis of Semantic Features of English Texts under Intelligent Algorithms
    Yu, Shuangshuang
    AUTOMATIC CONTROL AND COMPUTER SCIENCES, 2024, 58 (01) : 109 - 115
  • [30] Multilingual Ontologies for Cross-Language Information Extraction and Semantic Search
    Ernbley, David W.
    Liddle, Stephen W.
    Lonsdale, Deryle W.
    Tijerino, Yuri
    CONCEPTUAL MODELING - ER 2011, 2011, 6998 : 147 - +