Legal sentence boundary detection using hybrid deep learning and statistical models

被引:0
|
作者
Sheik, Reshma [1 ]
Ganta, Sneha Rao [1 ]
Nirmala, S. Jaya [1 ]
机构
[1] Natl Inst Technol Trichy, Tiruchirappalli, Tamil Nadu, India
关键词
Natural language processing; Sentence boundary detection; Deep learning; Transformer; LegalBERT; CaseLawBERT; CNN; CRF;
D O I
10.1007/s10506-024-09394-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sentence boundary detection (SBD) represents an important first step in natural language processing since accurately identifying sentence boundaries significantly impacts downstream applications. Nevertheless, detecting sentence boundaries within legal texts poses a unique and challenging problem due to their distinct structural and linguistic features. Our approach utilizes deep learning models to leverage delimiter and surrounding context information as input, enabling precise detection of sentence boundaries in English legal texts. We evaluate various deep learning models, including domain-specific transformer models like LegalBERT and CaseLawBERT. To assess the efficacy of our deep learning models, we compare them with a state-of-the-art domain-specific statistical conditional random field (CRF) model. After considering model size, F1-score, and inference time, we identify the Convolutional Neural Network Model (CNN) as the top-performing deep learning model. To further enhance performance, we integrate the features of the CNN model into the subsequent CRF model, creating a hybrid architecture that combines the strengths of both models. Our experiments demonstrate that the hybrid model outperforms the baseline model, achieving a 4% improvement in the F1-score. Additional experiments showcase the superiority of the hybrid model over SBD open-source libraries when confronted with an out-of-domain test set. These findings underscore the importance of efficient SBD in legal texts and emphasize the advantages of employing deep learning models and hybrid architectures to achieve optimal performance.
引用
收藏
页数:31
相关论文
共 50 条
  • [21] Forest road detection using deep learning models
    Caliskan, Erhan
    Sevim, Yusuf
    GEOCARTO INTERNATIONAL, 2022, 37 (20) : 5875 - 5890
  • [22] Sentence Boundary Detection Based on Parallel Lexical and Acoustic Models
    Che, Xiaoyin
    Luo, Sheng
    Yang, Haojin
    Meinel, Christoph
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2528 - 2532
  • [23] Discriminatively trained Gaussian Mixture Models for sentence boundary detection
    Tomalin, M.
    Woodland, P. C.
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 549 - 552
  • [24] Robust hybrid deep learning models for Alzheimer's progression detection
    Abuhmed, Tamer
    El-Sappagh, Shaker
    Alonso, Jose M.
    KNOWLEDGE-BASED SYSTEMS, 2021, 213
  • [25] Hybrid intrusion detection models based on GWO optimized deep learning
    Elsaid, Shaimaa Ahmed
    Shehab, Esraa
    Mattar, Ahmed M.
    Azar, Ahmad Taher
    Hameed, Ibrahim A.
    DISCOVER APPLIED SCIENCES, 2024, 6 (10)
  • [26] Anemia Detection using Ensemble Learning Techniques and Statistical Models
    Dalvi, Pooja Tukaram
    Vernekar, Nagaraj
    2016 IEEE INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ELECTRONICS, INFORMATION & COMMUNICATION TECHNOLOGY (RTEICT), 2016, : 1747 - 1751
  • [27] Automated Sentence Boundary Detection in Modern Standard Arabic Transcripts using Deep Neural Networks
    Gonzalez-Gallardo, Carlos-Emiliano
    Pontes, Elvys Linhares
    Sadat, Fatiha
    Torres-Moreno, Juan-Manuel
    ARABIC COMPUTATIONAL LINGUISTICS, 2018, 142 : 339 - 346
  • [28] Detecting Shilling Attacks Using Hybrid Deep Learning Models
    Ebrahimian, Mahsa
    Kashef, Rasha
    SYMMETRY-BASEL, 2020, 12 (11): : 1 - 15
  • [29] Multivariate Streamflow Simulation Using Hybrid Deep Learning Models
    Wegayehu, Eyob Betru
    Muluneh, Fiseha Behulu
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
  • [30] A hybrid framework for glaucoma detection through federated machine learning and deep learning models
    Aljohani, Abeer
    Aburasain, Rua Y.
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 24 (01)