Natural Language Processing Applications in Case-Law Text Publishing

被引:2
|
作者
Tarasconi, Francesco [1 ]
Botros, Milad [1 ]
Caserio, Matteo [1 ]
Sportelli, Gianpiero [1 ]
Giacalone, Giuseppe [2 ]
Uttini, Carlotta [2 ]
Vignati, Luca [2 ]
Zanetta, Fabrizio [2 ]
机构
[1] CELI Language Technol, Via San Quintino 31, I-10121 Turin, Italy
[2] Giuffre Francis Lefebvre, Milan, Italy
来源
关键词
natural language processing; applications; transfer learning; language models; text classification; information extraction; publishing industry; machine learning; BERT fine-tuning; random forest; Italian language;
D O I
10.3233/FAIA200859
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Processing case-law contents for electronic publishing purposes is a time-consuming activity that encompasses several sub-tasks and usually involves adding annotations to the original text. On the other hand, recent trends in Artificial Intelligence and Natural Language Processing enable the automatic and efficient analysis of big textual data. In this paper we present our Machine Learning solution to three specific business problems, regularly met by a real world Italian publisher in their day-to-day work: recognition of legal references in text spans, new content ranking by relevance, and text classification according to a given tree of topics. Different approaches based on BERT language model were experimented with, together with alternatives, typically based on Bag-of-Words. The optimal solution, deployed in a controlled production environment, was in two out of three cases based on fine-tuned BERT (for the extraction of legal references and text classification), while, in the case of relevance ranking, a Random Forest model, with hand-crafted features, was preferred. We will conclude by discussing the concrete impact, as perceived by the publisher, of the developed prototypes.
引用
收藏
页码:154 / 163
页数:10
相关论文
共 50 条
  • [31] Study on Chinglish in Web Text for Natural Language Processing
    Chen, Bo
    Chen, Lyu
    Ji, Ziqing
    CHINESE LEXICAL SEMANTICS, CLSW 2017, 2018, 10709 : 533 - 539
  • [32] Text Encryption Algorithm Based on Natural Language Processing
    Jing, Xianghe
    Hao, Yu
    Fei, Huaping
    Li, Zhijun
    2012 FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA INFORMATION NETWORKING AND SECURITY (MINES 2012), 2012, : 670 - 672
  • [34] ACADEMIC TEXT CLUSTERING USING NATURAL LANGUAGE PROCESSING
    Taskiran, Salimkan Fatma
    Kaya, Ersin
    KONYA JOURNAL OF ENGINEERING SCIENCES, 2022, 10 : 41 - 51
  • [35] Comparing Natural Language Processing and Quantum Natural Processing approaches in text classification tasks
    Peral-Garcia, David
    Cruz-Benito, Juan
    Garcia-Penalvo, Francisco Jose
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 254
  • [36] Empirical Laws of Natural Language Processing for Neural Language Generated Text
    Sumedha
    Rohilla, Rajesh
    INFORMATION, COMMUNICATION AND COMPUTING TECHNOLOGY (ICICCT 2021), 2021, 1417 : 184 - 197
  • [37] Toward a clinical text encoder: pretraining for clinical natural language processing with applications to substance misuse
    Dligach, Dmitriy
    Afshar, Majid
    Miller, Timothy
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2019, 26 (11) : 1272 - 1278
  • [38] THE RIGHT TO A HOME IN THE CASE-LAW OF ECHR VS. THE RIGHT TO A HOME IN THE CASE-LAW OF CROATIAN COURTS
    Bodul, Dejan
    EU LAW IN CONTEXT - ADJUSTMENT TO MEMBERSHIP AND CHALLENGES OF THE ENLARGEMENT, 2018, 2 : 553 - 570
  • [39] Applications of Natural Language Processing in Bilingual Language Teaching: An Indonesian-English Case Study
    Maxwell-Smith, Zara
    Foley, Ben
    Ochoa, Simon Gonzalez
    Suominen, Hanna
    INNOVATIVE USE OF NLP FOR BUILDING EDUCATIONAL APPLICATIONS, 2020, : 124 - 134
  • [40] NOTES OF CASE-LAW ON ADMINISTRATIVE RESOURCES
    Yanez, Ana
    REVISTA GENERAL DE DERECHO ADMINISTRATIVO, 2008, (17):