LoGE: an unsupervised local-global document extension generation in information retrieval for long documents

被引:0
|
作者
Ayoub, Oussama [1 ,2 ]
Rodrigues, Christophe [1 ]
Travers, Nicolas [1 ]
机构
[1] Leonard de Vinci Pole Univ, Res Ctr, Paris, France
[2] Seville More Helory, Courbevoie, France
关键词
Unsupervised document expansion; BERT; Information retrieval; BM25; MODEL;
D O I
10.1108/IJWIS-07-2023-0109
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Purpose - This paper aims to manage the word gap in information retrieval (IR) especially for long documents belonging to specific domains. In fact, with the continuous growth of text data that modern IR systems have to manage, existing solutions are needed to efficiently find the best set of documents for a given request. Thewords used to describe a query can differ from those used in related documents. Despite meaning closeness, nonoverlapping words are challenging for IR systems. Thisword gap becomes significant for long documents from specific domains. Design/methodology/approach - To generate new words for a document, a deep learning (DL) masked language model is used to infer related words. Used DL models are pretrained on massive text data and carry common or specific domain knowledge to propose a better document representation. Findings - The authors evaluate the approach of this study on specific IR domains with long documents to show the genericity of the proposed model and achieve encouraging results. Originality/value - In this paper, to the best of the authors' knowledge, an original unsupervised and modular IR systembased on recent DL methods is introduced.
引用
收藏
页码:244 / 262
页数:19
相关论文
共 50 条
  • [21] Evolving local and global weighting schemes in information retrieval
    Ronan Cummins
    Colm O’Riordan
    Information Retrieval, 2006, 9 : 311 - 330
  • [22] Evolving local and global weighting schemes in information retrieval
    Cummins, Ronan
    O'Riordan, Colm
    INFORMATION RETRIEVAL, 2006, 9 (03): : 311 - 330
  • [23] Automatic Ground Truth Generation of Camera Captured Documents Using Document Image Retrieval
    Ahmed, Sheraz
    Kise, Koichi
    Iwamura, Masakazu
    Liwicki, Marcus
    Dengel, Andreas
    2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 528 - 532
  • [24] LGI Net: Enhancing local-global information interaction for medical image segmentation
    Liu, Linjie
    Li, Yan
    Wu, Yanlin
    Ren, Lili
    Wang, Guanglei
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 167
  • [25] Unsupervised Nonlinear Adaptive Manifold Learning for Global and Local Information
    JiajunGao
    FanzhangLi
    BangjunWang
    HelanLiang
    Tsinghua Science and Technology, 2021, 26 (02) : 163 - 171
  • [26] Unsupervised Nonlinear Adaptive Manifold Learning for Global and Local Information
    Gao, Jiajun
    Li, Fanzhang
    Wang, Bangjun
    Liang, Helan
    TSINGHUA SCIENCE AND TECHNOLOGY, 2021, 26 (02) : 163 - 171
  • [27] Local-Global Geometric Information and View Complementarity Introduced Multiview Metric Learning
    Xu, Xinlei
    Wang, Zhe
    Ren, Shuangyan
    Niu, Saisai
    Li, Dongdong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 14
  • [28] Learning to summarize multi-documents with local and global information
    Nguyen, Van-Hau
    Mai, Son T.
    Nguyen, Minh-Tien
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2023, 12 (03) : 275 - 286
  • [29] An Text Information Retrieval Method by Integrating Global and Local Textual Information
    Wang, Zhibo
    Zhang, Yanqing
    PROCEEDINGS 2016 IEEE 40TH ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE WORKSHOPS, VOL 1, 2016, : 504 - 505
  • [30] Learning to summarize multi-documents with local and global information
    Van-Hau Nguyen
    Son T. Mai
    Minh-Tien Nguyen
    Progress in Artificial Intelligence, 2023, 12 : 275 - 286