The Utility of Context When Extracting Entities From Legal Documents

被引:3
|
作者
Donnelly, Jonathan [1 ]
Roegiest, Adam [1 ]
机构
[1] Kira Syst, Toronto, ON, Canada
关键词
D O I
10.1145/3340531.3412746
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
When reviewing documents for legal tasks such as Mergers and Acquisitions, granular information (such as start dates and exit clauses) need to be identified and extracted. Inspired by previous work in Named Entity Recognition (NER), we investigate how NER techniques can be leveraged to aid lawyers in this review process. Due to the extremely low prevalence of target information in legal documents, we find that the traditional approach of tagging all sentences in a document is inferior, in both effectiveness and data required to train and predict, to using a first-pass layer to identify sentences that are likely to contain the relevant information and then running the more traditional sentence-level sequence tagging. Moreover, we find that such entity-level models can be improved by training on a balanced sample of relevant and non-relevant sentences. We additionally describe the use of our system in production and how its usage by clients means that deep learning architectures tend to be cost inefficient, especially with respect to the necessary time to train models.
引用
收藏
页码:2397 / 2404
页数:8
相关论文
共 50 条
  • [41] Extracting new Spatial Entities and Relations from Short Messages
    Zenasni, Sarah
    Kergosien, Eric
    Roche, Mathieu
    Teisseire, Maguelonne
    PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON MANAGEMENT OF DIGITAL ECOSYSTEMS (MEDES 2016), 2016, : 189 - 196
  • [42] Information Extraction from Legal Documents
    Cheng, Tin Tin
    Cua, Jeffrey Leonard
    Tan, Mark Davies
    Yao, Kenneth Gerard
    Roxas, Rachel Edita
    2009 EIGHTH INTERNATIONAL SYMPOSIUM ON NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2009, : 157 - +
  • [43] LEGAL DOCUMENTS FROM THE CAIRO GENIZAH
    GOLB, N
    JEWISH SOCIAL STUDIES, 1958, 20 (01) : 17 - 46
  • [44] Extracting context from environmental audio
    Clarkson, B
    Pentland, A
    SECOND INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS - DIGEST OF PAPERS, 1998, : 154 - 155
  • [45] Extracting Interlinear Glossed Text from LATEX Documents
    Schenner, Mathias
    Nordhoff, Sebastian
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 4044 - 4048
  • [46] A novel approach for extracting text from color documents
    Annamalai University, Annamalai Nagar, Tamil Nadu, India
    World Acad. Sci. Eng. Technol., 2009, (1147-1152):
  • [47] Extracting Hyponymy of Ontology Concepts from Patent Documents
    Li, Junfeng
    Lv, Xueqiang
    Liu, Kehui
    2014 TENTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2014, : 283 - 287
  • [48] A linguistic and statistical approach for extracting knowledge from documents
    Sado, WN
    Fontaine, D
    Fontaine, P
    15TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2004, : 454 - 458
  • [49] Extracting variable knowledge from multiversioned XML documents
    Rusu, Laura Irina
    Rahayu, Wenny
    Taniar, David
    ICDM 2006: Sixth IEEE International Conference on Data Mining, Workshops, 2006, : 70 - 74
  • [50] A METHOD FOR EXTRACTING WATERMARKS FROM TEXTURED PRINTED DOCUMENTS
    Sergeyev, V. V.
    Fedoseev, V. A.
    COMPUTER OPTICS, 2014, 38 (04) : 825 - 832