Extending Full Text Search for Legal Document Collections Using Word Embeddings

被引:11
|
作者
Landthaler, Joerg [1 ]
Waltl, Bernhard [1 ]
Holl, Patrick [1 ]
Matthes, Florian [1 ]
机构
[1] Tech Univ Munich, Dept Informat, Software Engn Business Informat Syst, Munich, Germany
来源
关键词
information retrieval; full text search; relatedness search; recommender systems; text mining; word embeddings; EU-DSGVO; rental contracts;
D O I
10.3233/978-1-61499-726-9-73
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional full text search allows fast search for exact matches. However, full text search is not optimal to deal with synonyms or semantically related terms and phrases. In this paper we explore a novel method that provides the ability to find not only exact matches, but also semantically similar parts for arbitrary length search queries. We achieve this without the application of ontologies, but base our approach on Word Embeddings. Recently, Word Embeddings have been applied successfully for many natural language processing tasks. We argue that our method is well suited for legal document collections and examine its applicability for two different use cases: We conduct a case study on a stand-alone law, in particular the EU Data Protection Directive 94/46/EC (EU-DPD) in order to extract obligations. Secondly, from a collection of publicly available templates for German rental contracts we retrieve similar provisions.
引用
收藏
页码:73 / 82
页数:10
相关论文
共 50 条
  • [1] HistorEx: Exploring Historical Text Corpora Using Word and Document Embeddings
    Mueller, Sven
    Brunzel, Michael
    Kaun, Daniela
    Biswas, Russa
    Koutraki, Maria
    Tietz, Tabea
    Sack, Harald
    [J]. SEMANTIC WEB: ESWC 2019 SATELLITE EVENTS, 2019, 11762 : 136 - 140
  • [2] Arabic Text Classification Based on Word and Document Embeddings
    El Mahdaouy, Abdelkader
    Gaussier, Eric
    El Alaoui, Said Ouatik
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2016, 2017, 533 : 32 - 41
  • [3] Text Classification Using Word Embeddings
    Helaskar, Mukund N.
    Sonawane, Sheetal S.
    [J]. 2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2019,
  • [4] Automatic document screening of medical literature using word and text embeddings in an active learning setting
    Andres Carvallo
    Denis Parra
    Hans Lobel
    Alvaro Soto
    [J]. Scientometrics, 2020, 125 : 3047 - 3084
  • [5] Automatic document screening of medical literature using word and text embeddings in an active learning setting
    Carvallo, Andres
    Parra, Denis
    Lobel, Hans
    Soto, Alvaro
    [J]. SCIENTOMETRICS, 2020, 125 (03) : 3047 - 3084
  • [6] Automatic Text Summarization using Word Embeddings
    Easwar, Arjun
    Uthra, Annie
    [J]. PROCEEDINGS OF THE 2021 FIFTH INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC 2021), 2021, : 1065 - 1079
  • [7] Extending Full Text Search Engine for Mathematical Content
    Misutka, Jozef
    Galambos, Leo
    [J]. DML 2008 - TOWARDS DIGITAL MATHEMATICS LIBRARY, 2008, : 55 - 67
  • [8] Figure search by text in large scale digital document collections
    Yurtsever, M. Mucahit Enes
    Ozcan, Muhammet
    Taruz, Zubeyir
    Eken, Suleyman
    Sayar, Ahmet
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (01):
  • [9] Figure search by text in large scale digital document collections
    Yurtsever, M. Mücahit Enes
    Özcan, Muhammet
    Taruz, Zübeyir
    Eken, Süleyman
    Sayar, Ahmet
    [J]. Concurrency and Computation: Practice and Experience, 2022, 34 (01)
  • [10] Single document summarization using word and sentence embeddings
    Ayana
    [J]. PROCEEDINGS OF THE 2015 JOINT INTERNATIONAL MECHANICAL, ELECTRONIC AND INFORMATION TECHNOLOGY CONFERENCE (JIMET 2015), 2015, 10 : 523 - 526