Semantic role extraction in law texts: a comparative analysis of language models for legal information extraction

被引:0
|
作者
Bakker, Roos M. [1 ,2 ]
Schoevers, Akke J. [1 ,3 ]
van Drie, Romy A. N. [1 ]
Schraagen, Marijn P. [3 ]
de Boer, Maaike H. T. [1 ]
机构
[1] TNO, Dept Data Sci, The Hague, Netherlands
[2] Leiden Univ, Ctr Linguist, Leiden, Netherlands
[3] Univ Utrecht, Nat Language Proc, Utrecht, Netherlands
关键词
Semantic role labelling; Large language models; Legislation; Legal information extraction; Legal semantic roles;
D O I
10.1007/s10506-025-09437-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Norms are essential in our society: they dictate how individuals should behave and interact within a community. They can be written down in laws or other written sources. Interpretations often differ; this is where formalisations offer a solution. They express an interpretation of a source of norms in a transparent manner. However, creating these interpretations is labour intensive. Natural language processing techniques can support this process. Previous work showed the potential of transformer-based models for Dutch law texts. In this paper, we (1) introduce a dataset of 2335 English sentences annotated with legal semantic roles conform the Flint framework; (2) fine-tune a collection of language models on this dataset, and (3) query two non-fine-tuned generative large language models (LLMs). This allows us to compare performance of fine-tuned domain-specific, task-specific, and general language models with non-fine-tuned generative LLMs. The results show that models fine-tuned on our dataset have the best performance (accuracy around 0.88). Furthermore, domain-specific models perform better than general models, indicating that domain knowledge is of added value for this task. Finally, different methods of querying LLMs perform unsatisfactorily, with maximum accuracy scores around 0.6. This indicates that for specific tasks, such as this adaptation of semantic role labelling, the process of annotating data and fine-tuning a smaller language model is preferred over querying a generative LLM, especially when domain-specific models are available.
引用
收藏
页数:35
相关论文
共 50 条
  • [1] Information extraction from legal texts: the potential of discourse analysis
    Moens, MF
    Uyttendaele, C
    Dumortier, J
    INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES, 1999, 51 (06) : 1155 - 1171
  • [2] An automated framework for the extraction of semantic legal metadata from legal texts
    Sleimi, Amin
    Sannier, Nicolas
    Sabetzadeh, Mehrdad
    Briand, Lionel
    Ceci, Marcello
    Dann, John
    EMPIRICAL SOFTWARE ENGINEERING, 2021, 26 (03)
  • [3] An automated framework for the extraction of semantic legal metadata from legal texts
    Amin Sleimi
    Nicolas Sannier
    Mehrdad Sabetzadeh
    Lionel Briand
    Marcello Ceci
    John Dann
    Empirical Software Engineering, 2021, 26
  • [4] Information and relation extraction for semantic annotation of ebook texts
    Uddin, Ashraf
    Piryani, Rajesh
    Singh, Vivek Kumar
    Advances in Intelligent Systems and Computing, 2014, 235 : 215 - 226
  • [5] Comparative Analysis of Large Language Models in Structured Information Extraction from Job Postings
    Sioziou, Kyriaki
    Zervas, Panagiotis
    Giotopoulos, Kostas
    Tzimas, Giannis
    ENGINEERING APPLICATIONS OF NEURAL NETWORKS, EANN 2024, 2024, 2141 : 82 - 92
  • [6] Stochastic models for surface information extraction in texts
    Amini, MR
    Zaragoza, H
    Gallinari, P
    NINTH INTERNATIONAL CONFERENCE ON ARTIFICIAL NEURAL NETWORKS (ICANN99), VOLS 1 AND 2, 1999, (470): : 892 - 897
  • [7] Semantic Extraction from Texts
    Jusoh, Shaidah
    Al Fawareh, Hejab M.
    PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATIONS, 2009, : 595 - 601
  • [8] SEMANTIC INFORMATION OF NATURAL LANGUAGE AND ITS EXTRACTION AND CLASSIFICATION
    OKADA, N
    TAMATI, T
    ELECTRONICS & COMMUNICATIONS IN JAPAN, 1969, 52 (10): : 185 - &
  • [9] Meaningful texts: the extraction of semantic information from monolingual and multilingual corpora
    Frazier, Stefan
    INTERNATIONAL JOURNAL OF BILINGUAL EDUCATION AND BILINGUALISM, 2009, 12 (04) : 489 - 492
  • [10] Automated Extraction of Semantic Legal Metadata Using Natural Language Processing
    Sleimi, Amin
    Sannier, Nicolas
    Sabetzadeh, Mehrdad
    Briand, Lionel C.
    Dann, John
    2018 IEEE 26TH INTERNATIONAL REQUIREMENTS ENGINEERING CONFERENCE (RE 2018), 2018, : 124 - 135