Semantic role extraction in law texts: a comparative analysis of language models for legal information extraction

被引：0

作者：

Bakker, Roos M. ^{[1
,2
]}

Schoevers, Akke J. ^{[1
,3
]}

van Drie, Romy A. N. ^{[1
]}

Schraagen, Marijn P. ^{[3
]}

de Boer, Maaike H. T. ^{[1
]}

机构：

[1] TNO, Dept Data Sci, The Hague, Netherlands

[2] Leiden Univ, Ctr Linguist, Leiden, Netherlands

[3] Univ Utrecht, Nat Language Proc, Utrecht, Netherlands

来源：

ARTIFICIAL INTELLIGENCE AND LAW | 2025年

关键词：

Semantic role labelling; Large language models; Legislation; Legal information extraction; Legal semantic roles;

D O I：

10.1007/s10506-025-09437-x

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Norms are essential in our society: they dictate how individuals should behave and interact within a community. They can be written down in laws or other written sources. Interpretations often differ; this is where formalisations offer a solution. They express an interpretation of a source of norms in a transparent manner. However, creating these interpretations is labour intensive. Natural language processing techniques can support this process. Previous work showed the potential of transformer-based models for Dutch law texts. In this paper, we (1) introduce a dataset of 2335 English sentences annotated with legal semantic roles conform the Flint framework; (2) fine-tune a collection of language models on this dataset, and (3) query two non-fine-tuned generative large language models (LLMs). This allows us to compare performance of fine-tuned domain-specific, task-specific, and general language models with non-fine-tuned generative LLMs. The results show that models fine-tuned on our dataset have the best performance (accuracy around 0.88). Furthermore, domain-specific models perform better than general models, indicating that domain knowledge is of added value for this task. Finally, different methods of querying LLMs perform unsatisfactorily, with maximum accuracy scores around 0.6. This indicates that for specific tasks, such as this adaptation of semantic role labelling, the process of annotating data and fine-tuning a smaller language model is preferred over querying a generative LLM, especially when domain-specific models are available.

引用

页数：35

共 50 条

[11] Reference extraction and resolution for legal texts
Martínez-González, M
de la Fuente, P
Vicente, DJ
PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2005, 3776 : 218 - 221
[12] Semantic tracing analysis algorithm in information extraction
Hu, R.
Zhang, D.M.
Du, P.
Jisuanji Gongcheng/Computer Engineering, 2001, 27 (04):
[13] Document Layout Analysis for Semantic Information Extraction
Adrian, Weronika T.
Leone, Nicola
Manna, Marco
Marte, Cinzia
AI*IA 2017 ADVANCES IN ARTIFICIAL INTELLIGENCE, 2017, 10640 : 269 - 281
[14] Extraction of Physical Effects Based on the Semantic Analysis of the Patent Texts
Fomenkova, Marina
Korobkin, Dmitriy
Fomenkov, Sergey
CREATIVITY IN INTELLIGENT TECHNOLOGIES AND DATA SCIENCE, (CIT&DS), 2017, 754 : 73 - 87
[15] Extracting Legal Norm Analysis Categories from German Law Texts with Large Language Models
Bachinger, Sarah T.
Feddoul, Leila
Mauch, Marianne
Koenig-Ries, Birgitta
PROCEEDINGS OF THE 25TH ANNUAL INTERNATIONAL CONFERENCE ON DIGITAL GOVERNMENT RESEARCH, DGO 2024, 2024, : 481 - 493
[16] Semantic Information Extraction for Software Requirements using Semantic Role Labeling
Wang, Yinglin
PROCEEDINGS OF 2015 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATCS AND COMPUTING (IEEE PIC), 2015, : 332 - 337
[17] Signal Phrase Extraction: A Gateway to Information Retrieval Improvement in Law Texts
Van Der Veen, Michael
Sidorova, Natalia
LEGAL KNOWLEDGE AND INFORMATION SYSTEMS, 2021, 346 : 127 - 130
[18] Hybrid framework for information extraction for geographical terms in Hindi language texts
Dutta, K
Prakash, N
Kaushik, S
PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (IEEE NLP-KE'05), 2005, : 577 - 581
[19] Meaningful texts: The extraction of semantic information from monolingual and multilingual corpora.
Stvan, LS
DISCOURSE STUDIES, 2006, 8 (02) : 330 - 331
[20] Features of the language of law: A comparative study of Polish, English and Indonesian legal texts
Zozula, Daria
INTERNATIONAL JOURNAL OF LEGAL DISCOURSE, 2019, 4 (01) : 69 - 86

← 1 2 3 4 5 →