The Utility of Context When Extracting Entities From Legal Documents

被引:3
|
作者
Donnelly, Jonathan [1 ]
Roegiest, Adam [1 ]
机构
[1] Kira Syst, Toronto, ON, Canada
关键词
D O I
10.1145/3340531.3412746
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
When reviewing documents for legal tasks such as Mergers and Acquisitions, granular information (such as start dates and exit clauses) need to be identified and extracted. Inspired by previous work in Named Entity Recognition (NER), we investigate how NER techniques can be leveraged to aid lawyers in this review process. Due to the extremely low prevalence of target information in legal documents, we find that the traditional approach of tagging all sentences in a document is inferior, in both effectiveness and data required to train and predict, to using a first-pass layer to identify sentences that are likely to contain the relevant information and then running the more traditional sentence-level sequence tagging. Moreover, we find that such entity-level models can be improved by training on a balanced sample of relevant and non-relevant sentences. We additionally describe the use of our system in production and how its usage by clients means that deep learning architectures tend to be cost inefficient, especially with respect to the necessary time to train models.
引用
收藏
页码:2397 / 2404
页数:8
相关论文
共 50 条
  • [1] Extracting indices from Japanese legal documents
    Tho Thi Ngoc Le
    Shirai, Kiyoaki
    Minh Le Nguyen
    Shimazu, Akira
    ARTIFICIAL INTELLIGENCE AND LAW, 2015, 23 (04) : 315 - 344
  • [2] Extracting Complex Named Entities in Legal Documents via Weakly Supervised Object Detection
    Yang, Hsiu-Wei
    Agrawal, Abhinav
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 3349 - 3353
  • [3] Automatic Extraction of Entities and Relation from Legal Documents
    Andrew, Judith Jeyafreeda
    Tannier, Xavier
    NAMED ENTITIES, 2018, : 1 - 8
  • [4] Applying GaiusT for Extracting Requirements from Legal Documents
    Zeni, Nicola
    Mich, Luisa
    Mylopoulos, John
    Cordy, James R.
    2013 6TH INTERNATIONAL WORKSHOP ON REQUIREMENTS ENGINEERING AND LAW (RELAW), 2013, : 65 - 68
  • [5] A Methodology for Extracting Legal Norms from Regulatory Documents
    Hashmi, Mustafa
    PROCEEDINGS OF THE 2015 IEEE 19TH INTERNATIONAL ENTERPRISE DISTRIBUTED OBJECT COMPUTING CONFERENCE WORKSHOPS AND DEMONSTRATIONS (EDOCW 2015), 2015, : 41 - 50
  • [6] Context Driven Approach for Extracting Relevant Documents from WWW
    Sarika
    Chaudhary, Meena
    2015 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION & AUTOMATION (ICCCA), 2015, : 837 - 842
  • [7] Extracting Named Entities from Russian-Language Documents with Varying Degrees of Structural Clarity
    M. D. Averina
    O. A. Levanova
    Automatic Control and Computer Sciences, 2024, 58 (7) : 969 - 976
  • [8] Sequence-to-Sequence Models for Extracting Information from Registration and Legal Documents
    Pires, Ramon
    de Souza, Fabio C.
    Rosa, Guilherme
    Lotufo, Roberto A.
    Nogueira, Rodrigo
    DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 83 - 95
  • [9] Automatic detection and analysis of DPP entities in legal contract documents
    Nayak, Shiva Prasad
    Pasumarthi, Suresh
    2019 FIRST INTERNATIONAL CONFERENCE ON DIGITAL DATA PROCESSING (DDP), 2019, : 70 - 75
  • [10] Extracting Geospatial Entities from Wikipedia
    Witmer, Jeremy
    Kalita, Jugal
    2009 IEEE THIRD INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2009), 2009, : 450 - 457