Legal sentence boundary detection using hybrid deep learning and statistical models

被引:0
|
作者
Sheik, Reshma [1 ]
Ganta, Sneha Rao [1 ]
Nirmala, S. Jaya [1 ]
机构
[1] Natl Inst Technol Trichy, Tiruchirappalli, Tamil Nadu, India
关键词
Natural language processing; Sentence boundary detection; Deep learning; Transformer; LegalBERT; CaseLawBERT; CNN; CRF;
D O I
10.1007/s10506-024-09394-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sentence boundary detection (SBD) represents an important first step in natural language processing since accurately identifying sentence boundaries significantly impacts downstream applications. Nevertheless, detecting sentence boundaries within legal texts poses a unique and challenging problem due to their distinct structural and linguistic features. Our approach utilizes deep learning models to leverage delimiter and surrounding context information as input, enabling precise detection of sentence boundaries in English legal texts. We evaluate various deep learning models, including domain-specific transformer models like LegalBERT and CaseLawBERT. To assess the efficacy of our deep learning models, we compare them with a state-of-the-art domain-specific statistical conditional random field (CRF) model. After considering model size, F1-score, and inference time, we identify the Convolutional Neural Network Model (CNN) as the top-performing deep learning model. To further enhance performance, we integrate the features of the CNN model into the subsequent CRF model, creating a hybrid architecture that combines the strengths of both models. Our experiments demonstrate that the hybrid model outperforms the baseline model, achieving a 4% improvement in the F1-score. Additional experiments showcase the superiority of the hybrid model over SBD open-source libraries when confronted with an out-of-domain test set. These findings underscore the importance of efficient SBD in legal texts and emphasize the advantages of employing deep learning models and hybrid architectures to achieve optimal performance.
引用
收藏
页数:31
相关论文
共 50 条
  • [1] A Hybrid Deep Learning Architecture for Sentence Unit Detection
    Duy-Cat Can
    Ho, Thi-Nga
    Chng, Eng-Siong
    2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 129 - 132
  • [2] Detection and classification of electrocardiography using hybrid deep learning models
    Selvam, Immaculate Joy
    Madhavan, Moorthi
    Kumarasamy, Senthil Kumar
    HELLENIC JOURNAL OF CARDIOLOGY, 2025, 81 : 75 - 84
  • [3] Sentence Boundary Detection in German Legal Documents
    Glaser, Ingo
    Moser, Sebastian
    Matthes, Florian
    ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 2, 2021, : 812 - 821
  • [4] Detection and classification of brain tumor using hybrid deep learning models
    Baiju Babu Vimala
    Saravanan Srinivasan
    Sandeep Kumar Mathivanan
    Prabhu Mahalakshmi
    Gemmachis Teshite Jayagopal
    Scientific Reports, 13
  • [5] Detection of land subsidence using hybrid and ensemble deep learning models
    Kariminejad, Narges
    Mohammadifar, Aliakbar
    Sepehr, Adel
    Garajeh, Mohammad Kazemi
    Rezaei, Mahrooz
    Desir, Gloria
    Quesada-Roman, Adolfo
    Gholami, Hamid
    APPLIED GEOMATICS, 2024, 16 (03) : 593 - 610
  • [6] Detection and classification of brain tumor using hybrid deep learning models
    Babu Vimala, Baiju
    Srinivasan, Saravanan
    Mathivanan, Sandeep Kumar
    Mahalakshmi
    Jayagopal, Prabhu
    Dalu, Gemmachis Teshite
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [7] MultiLegalSBD: A Multilingual Legal Sentence Boundary Detection Dataset
    Brugger, Tobias
    Sturmer, Matthias
    Niklaus, Joel
    PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND LAW, ICAIL 2023, 2023, : 42 - 51
  • [8] Rootkit Detection Using Hybrid Machine Learning Models and Deep Learning Model: Implementation
    Kumar, Suresh S.
    Stephen, S.
    Rumysia, Suhainul M.
    2024 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND APPLIED INFORMATICS, ACCAI 2024, 2024,
  • [9] Depression Detection in Social Media Using NLP and Hybrid Deep Learning Models
    Padmaja, S. M.
    Godla, Sanjiv Rao
    Ramesh, Janjhyam Venkata Naga
    Muniyandy, Elangovan
    Sridevi, Pothumarthi
    El-Ebiary, Yousef A. Baker
    Devadhas, David Neels Ponkumar
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2025, 16 (02) : 1071 - 1080
  • [10] Damaged cable detection with statistical analysis, clustering, and deep learning models
    Son, Hyesook
    Yoon, Chanyoung
    Kim, Yejin
    Jang, Yun
    Tran, Linh Viet
    Kim, Seung-Eock
    Kim, Dong Joo
    Park, Jongwoong
    SMART STRUCTURES AND SYSTEMS, 2022, 29 (01) : 17 - 28