Semantic role extraction in law texts: a comparative analysis of language models for legal information extraction

被引：0

作者：

Bakker, Roos M. ^{[1
,2
]}

Schoevers, Akke J. ^{[1
,3
]}

van Drie, Romy A. N. ^{[1
]}

Schraagen, Marijn P. ^{[3
]}

de Boer, Maaike H. T. ^{[1
]}

机构：

[1] TNO, Dept Data Sci, The Hague, Netherlands

[2] Leiden Univ, Ctr Linguist, Leiden, Netherlands

[3] Univ Utrecht, Nat Language Proc, Utrecht, Netherlands

来源：

ARTIFICIAL INTELLIGENCE AND LAW | 2025年

关键词：

Semantic role labelling; Large language models; Legislation; Legal information extraction; Legal semantic roles;

D O I：

10.1007/s10506-025-09437-x

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Norms are essential in our society: they dictate how individuals should behave and interact within a community. They can be written down in laws or other written sources. Interpretations often differ; this is where formalisations offer a solution. They express an interpretation of a source of norms in a transparent manner. However, creating these interpretations is labour intensive. Natural language processing techniques can support this process. Previous work showed the potential of transformer-based models for Dutch law texts. In this paper, we (1) introduce a dataset of 2335 English sentences annotated with legal semantic roles conform the Flint framework; (2) fine-tune a collection of language models on this dataset, and (3) query two non-fine-tuned generative large language models (LLMs). This allows us to compare performance of fine-tuned domain-specific, task-specific, and general language models with non-fine-tuned generative LLMs. The results show that models fine-tuned on our dataset have the best performance (accuracy around 0.88). Furthermore, domain-specific models perform better than general models, indicating that domain knowledge is of added value for this task. Finally, different methods of querying LLMs perform unsatisfactorily, with maximum accuracy scores around 0.6. This indicates that for specific tasks, such as this adaptation of semantic role labelling, the process of annotating data and fine-tuning a smaller language model is preferred over querying a generative LLM, especially when domain-specific models are available.

引用

页数：35

共 50 条

[31] Natural Language Processing Techniques for the Extraction of Semantic Information in Web Services
Bravo, Maricela
Montes, Azucena
Reyes, Alejandro
PROCEEDINGS OF THE SPECIAL SESSION OF THE SEVENTH MEXICAN INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE - MICAI 2008, 2008, : 53 - 57
[32] Information Extraction Model based on Semantic Role and Conceptual Graph
Yang, Xuanxuan
Zhang, Lei
2009 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION SYSTEMS AND APPLICATIONS, PROCEEDINGS, 2009, : 386 - 389
[33] Open Information Extraction from Texts: Part II. Extraction of Semantic Relationships Using Unsupervised Machine Learning
A. O. Shelmanov
D. A. Devyatkin
V. A. Isakov
I. V. Smirnov
Scientific and Technical Information Processing, 2020, 47 : 340 - 347
[34] Open Information Extraction from Texts: Part II. Extraction of Semantic Relationships Using Unsupervised Machine Learning
Shelmanov, A. O.
Devyatkin, D. A.
Isakov, V. A.
Smirnov, I., V
SCIENTIFIC AND TECHNICAL INFORMATION PROCESSING, 2020, 47 (06) : 340 - 347
[35] A Comparative Analysis on the Summarization of Legal Texts Using Transformer Models
Nunez-Robinson, Daniel
Talavera-Montalto, Jose
Ugarte, Willy
ADVANCED RESEARCH IN TECHNOLOGIES, INFORMATION, INNOVATION AND SUSTAINABILITY, ARTIIS 2022, PT I, 2022, 1675 : 372 - 386
[36] Information extraction from Greek texts
Karra, M
Bekakos, MP
NEURAL, PARALLEL, AND SCIENTIFIC COMPUTATIONS, VOL 2, PROCEEDINGS, 2002, : 17 - 20
[37] Information Extraction of Texts in the Biomedical Domain
Cotik, Viviana
PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 4357 - 4358
[38] The impact of semantic class identification and semantic role labeling on natural language answer extraction
Ofoghi, Bahadorreza
Yearwood, John
Ma, Liping
ADVANCES IN INFORMATION RETRIEVAL, 2008, 4956 : 430 - 437
[39] Fractal feature extraction of English language based on semantic analysis
Yao Z.
International Journal of Reasoning-based Intelligent Systems, 2022, 14 (04) : 215 - 220
[40] A Comparative Study of Large Language Models for Goal Model Extraction
Siddeshwar, Vaishali
Alwidian, Sanaa
Makrehchi, Masoud
ACM/IEEE 27TH INTERNATIONAL CONFERENCE ON MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS: COMPANION PROCEEDINGS, MODELS 2024, 2024, : 253 - 263

← 1 2 3 4 5 →