Legal sentence boundary detection using hybrid deep learning and statistical models

被引:0
|
作者
Sheik, Reshma [1 ]
Ganta, Sneha Rao [1 ]
Nirmala, S. Jaya [1 ]
机构
[1] Natl Inst Technol Trichy, Tiruchirappalli, Tamil Nadu, India
关键词
Natural language processing; Sentence boundary detection; Deep learning; Transformer; LegalBERT; CaseLawBERT; CNN; CRF;
D O I
10.1007/s10506-024-09394-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sentence boundary detection (SBD) represents an important first step in natural language processing since accurately identifying sentence boundaries significantly impacts downstream applications. Nevertheless, detecting sentence boundaries within legal texts poses a unique and challenging problem due to their distinct structural and linguistic features. Our approach utilizes deep learning models to leverage delimiter and surrounding context information as input, enabling precise detection of sentence boundaries in English legal texts. We evaluate various deep learning models, including domain-specific transformer models like LegalBERT and CaseLawBERT. To assess the efficacy of our deep learning models, we compare them with a state-of-the-art domain-specific statistical conditional random field (CRF) model. After considering model size, F1-score, and inference time, we identify the Convolutional Neural Network Model (CNN) as the top-performing deep learning model. To further enhance performance, we integrate the features of the CNN model into the subsequent CRF model, creating a hybrid architecture that combines the strengths of both models. Our experiments demonstrate that the hybrid model outperforms the baseline model, achieving a 4% improvement in the F1-score. Additional experiments showcase the superiority of the hybrid model over SBD open-source libraries when confronted with an out-of-domain test set. These findings underscore the importance of efficient SBD in legal texts and emphasize the advantages of employing deep learning models and hybrid architectures to achieve optimal performance.
引用
收藏
页数:31
相关论文
共 50 条
  • [31] Phishing Attacks Detection using Machine Learning and Deep Learning Models
    Aljabri, Malak
    Mirza, Samiha
    2022 7TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND MACHINE LEARNING APPLICATIONS (CDMA 2022), 2022, : 175 - 180
  • [32] Forest Vegetation Detection Using Deep Learning Object Detection Models
    Mendes, Paulo A. S.
    Coimbra, Antonio Paulo
    de Almeida, Anibal T.
    FORESTS, 2023, 14 (09):
  • [33] Deep Learning Models for Crime Intention Detection Using Object Detection
    Hashi, Abdirahman Osman
    Abdirahman, Abdullahi Ahmed
    Elmi, Mohamed Abdirahman
    Rodriguez, Octavio Ernest Romo
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (04) : 300 - 306
  • [34] Deep learning models to study sentence comprehension in the human brain
    Arana, Sophie
    Lerousseau, Jacques Pesnot
    Hagoort, Peter
    LANGUAGE COGNITION AND NEUROSCIENCE, 2024, 39 (08) : 972 - 990
  • [35] Early Detection of Skin Diseases Across Diverse Skin Tones Using Hybrid Machine Learning and Deep Learning Models
    Aquil, Akasha
    Saeed, Faisal
    Baowidan, Souad
    Ali, Abdullah Marish
    Elmitwally, Nouh Sabri
    INFORMATION, 2025, 16 (02)
  • [36] Robust CRW crops leaf disease detection and classification in agriculture using hybrid deep learning models
    Baiju, B. V.
    Kirupanithi, Nancy
    Srinivasan, Saravanan
    Kapoor, Anjali
    Mathivanan, Sandeep Kumar
    Shah, Mohd Asif
    PLANT METHODS, 2025, 21 (01)
  • [37] Alzheimer's disease detection using residual neural network with LSTM hybrid deep learning models
    Vidhya, R.
    Banavath, Dhanalaxmi
    Kayalvili, S.
    Naidu, Swarna Mahesh
    Prabu, V. Charles
    Sugumar, D.
    Hemalatha, R.
    Vimal, S.
    Vidhya, R. G.
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (06) : 12095 - 12109
  • [38] Detection of COVID-19 from X-rays using hybrid deep learning models
    Nandi R.
    Mulimani M.
    Research on Biomedical Engineering, 2021, 37 (04) : 687 - 695
  • [39] Cyberbullying Detection on Social Networks Using a Hybrid Deep Learning Architecture Based on Convolutional and Recurrent Models
    Altayeva, Aigerim
    Abdrakhmanov, Rustam
    Toktarova, Aigerim
    Tolep, Abdimukhan
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (10) : 159 - 167
  • [40] HOSMI-LBP-BASED FEATURE EXTRACTION FOR MELANOMA DETECTION USING HYBRID DEEP LEARNING MODELS
    Kumar Tiwari, Abhinandan
    Kumar Mishra, Manoj
    Ranjan Panda, Amiya
    Panda, Bikramaditya
    JOURNAL OF MECHANICS IN MEDICINE AND BIOLOGY, 2021, 21 (03)