Extending Full Text Search for Legal Document Collections Using Word Embeddings

被引:11
|
作者
Landthaler, Joerg [1 ]
Waltl, Bernhard [1 ]
Holl, Patrick [1 ]
Matthes, Florian [1 ]
机构
[1] Tech Univ Munich, Dept Informat, Software Engn Business Informat Syst, Munich, Germany
来源
关键词
information retrieval; full text search; relatedness search; recommender systems; text mining; word embeddings; EU-DSGVO; rental contracts;
D O I
10.3233/978-1-61499-726-9-73
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional full text search allows fast search for exact matches. However, full text search is not optimal to deal with synonyms or semantically related terms and phrases. In this paper we explore a novel method that provides the ability to find not only exact matches, but also semantically similar parts for arbitrary length search queries. We achieve this without the application of ontologies, but base our approach on Word Embeddings. Recently, Word Embeddings have been applied successfully for many natural language processing tasks. We argue that our method is well suited for legal document collections and examine its applicability for two different use cases: We conduct a case study on a stand-alone law, in particular the EU Data Protection Directive 94/46/EC (EU-DPD) in order to extract obligations. Secondly, from a collection of publicly available templates for German rental contracts we retrieve similar provisions.
引用
下载
收藏
页码:73 / 82
页数:10
相关论文
共 50 条
  • [21] Search Result Personalization in Twitter Using Neural Word Embeddings
    Samarawickrama, Sameendra
    Karunasekera, Shanika
    Harwood, Aaron
    Kotagiri, Ramamohanarao
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2017, 2017, 10440 : 244 - 258
  • [22] THE CURSE OF THAMUS - AN ANALYSIS OF FULL-TEXT LEGAL DOCUMENT-RETRIEVAL
    DABNEY, DP
    LAW LIBRARY JOURNAL, 1986, 78 (01): : 5 - 40
  • [23] MULTITOPIC TEXT CLUSTERING AND CLUSTER LABELING USING CONTEXTUALIZED WORD EMBEDDINGS
    Ostapiuk, Z., V
    Korotyeyeva, T. O.
    RADIO ELECTRONICS COMPUTER SCIENCE CONTROL, 2020, (04) : 95 - 105
  • [24] Search and Graphical Visualization of Concepts in Document Collections Using Taxonomies
    Schmidt, Andreas
    Kimmig, Daniel
    Dickerhof, Markus
    PROCEEDINGS OF THE 46TH ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, 2013, : 1429 - 1434
  • [25] Visualizing search results and document collections using topic maps
    Newman, David
    Baldwin, Timothy
    Cavedon, Lawrence
    Huang, Eric
    Karimi, Sarvnaz
    Martinez, David
    Scholer, Falk
    Zobel, Justin
    JOURNAL OF WEB SEMANTICS, 2010, 8 (2-3): : 169 - 175
  • [26] A Knowledge Discovery from Full-Text Document Collections Using Clustering and Interpretable Genetic-Fuzzy Systems
    Rudzinski, Filip
    MULTIMEDIA AND NETWORK INFORMATION SYSTEMS, 2019, 833 : 434 - 443
  • [27] Using Text Search for Personal Photo Collections with the MediAssist System
    O'Hare, Neil
    Gurrin, Cathal
    Jones, Gareth J. F.
    Lee, Hyowon
    O'Connor, Noel E.
    Smeaton, Alan F.
    APPLIED COMPUTING 2007, VOL 1 AND 2, 2007, : 880 - 881
  • [28] Document Summarization Using Sentence-Level Semantic Based on Word Embeddings
    Al-Sabahi, Kamal
    Zhang Zuping
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2019, 29 (02) : 177 - 196
  • [29] REPRESENTING WORD IMAGE USING VISUAL WORD EMBEDDINGS AND RNN FOR KEYWORD SPOTTING ON HISTORICAL DOCUMENT IMAGES
    Wei, Hongxi
    Zhang, Hui
    Gao, Guanglai
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 1368 - 1373
  • [30] Deep text classification of Instagram data using word embeddings and weak supervision
    Hammar, Kim
    Jaradat, Shatha
    Dokoohaki, Nima
    Matskin, Mihhail
    WEB INTELLIGENCE, 2020, 18 (01) : 53 - 67