LoGE: an unsupervised local-global document extension generation in information retrieval for long documents

被引:0
|
作者
Ayoub, Oussama [1 ,2 ]
Rodrigues, Christophe [1 ]
Travers, Nicolas [1 ]
机构
[1] Leonard de Vinci Pole Univ, Res Ctr, Paris, France
[2] Seville More Helory, Courbevoie, France
关键词
Unsupervised document expansion; BERT; Information retrieval; BM25; MODEL;
D O I
10.1108/IJWIS-07-2023-0109
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Purpose - This paper aims to manage the word gap in information retrieval (IR) especially for long documents belonging to specific domains. In fact, with the continuous growth of text data that modern IR systems have to manage, existing solutions are needed to efficiently find the best set of documents for a given request. Thewords used to describe a query can differ from those used in related documents. Despite meaning closeness, nonoverlapping words are challenging for IR systems. Thisword gap becomes significant for long documents from specific domains. Design/methodology/approach - To generate new words for a document, a deep learning (DL) masked language model is used to infer related words. Used DL models are pretrained on massive text data and carry common or specific domain knowledge to propose a better document representation. Findings - The authors evaluate the approach of this study on specific IR domains with long documents to show the genericity of the proposed model and achieve encouraging results. Originality/value - In this paper, to the best of the authors' knowledge, an original unsupervised and modular IR systembased on recent DL methods is introduced.
引用
收藏
页码:244 / 262
页数:19
相关论文
共 50 条
  • [1] Local-Global Decompositions for Conditional Microstructure Generation
    Robertson, Andreas E.
    Kelly, Conlain
    Buzzy, Michael
    Kalidindi, Surya R.
    [J]. ACTA MATERIALIA, 2023, 253
  • [2] Local-Global Graph Pooling via Mutual Information Maximization for Video-Paragraph Retrieval
    Zhang, Pengcheng
    Zhao, Zhou
    Wang, Nannan
    Yu, Jun
    Wu, Fei
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (10) : 7133 - 7146
  • [3] InPars: Unsupervised Dataset Generation for Information Retrieval
    Bonifacio, Luiz
    Abonizio, Hugo
    Fadaee, Marzieh
    Nogueira, Rodrigo
    [J]. PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 2387 - 2392
  • [4] Remarks on the local-global principle for a subcategory consisting of extension modules
    Yoshizawa, Takeshi
    [J]. JOURNAL OF ALGEBRA AND ITS APPLICATIONS, 2019, 18 (12)
  • [5] A Study of Retrieval Models for Long Documents and Queries in Information Retrieval
    Cummins, Ronan
    [J]. PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16), 2016, : 795 - 805
  • [6] Illumination Compensation for Document Images Using Local-Global Block Analysis
    Azmi, Mohd. Hafrizal
    Saripan, M. Iqbal
    Azmir, Raja Syamsul
    Abdullah, Raja
    [J]. VISUAL INFORMATICS: BRIDGING RESEARCH AND PRACTICE, 2009, 5857 : 636 - 644
  • [7] Prototype local-global alignment network for image-text retrieval
    Meng, Lingtao
    Zhang, Feifei
    Zhang, Xi
    Xu, Changsheng
    [J]. INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2022, 11 (04) : 525 - 538
  • [8] Automatic Extracapsular Extension Identification in Head and Neck Cancer Using Deep Neural Network with Local-Global Information
    Wang, Y.
    Thomas, T. V.
    Duggar, W. N.
    Roberts, P. R.
    Gatewood, R. T.
    Bian, L.
    Wang, H.
    [J]. INTERNATIONAL JOURNAL OF RADIATION ONCOLOGY BIOLOGY PHYSICS, 2021, 111 (03): : E98 - E98
  • [9] Unsupervised learning aided by clustering and local-global hierarchical analysis in knowledge exploration
    Zhang, Yihao
    Orgun, Mehmet A.
    Lin, Weiqiang
    [J]. Journal of Digital Information Management, 2007, 5 (04): : 237 - 246
  • [10] Effective document information retrieval system for both paper and electronic documents
    Wong, Chan-Tang
    Pun, Chi-Man
    [J]. PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 54 - +