Topic Models Ensembles for AD-HOC Information Retrieval

被引:2
|
作者
Ormeno, Pablo [1 ]
Mendoza, Marcelo [1 ]
Valle, Carlos [2 ]
机构
[1] Univ Tecn Federico Santa Maria, Dept Informat, Valparaiso 2340000, Chile
[2] Univ Playa Ancha Ciencias Educ, Dept Informat, Valparaiso 2340000, Chile
关键词
ad hoc information retrieval; Latent Dirichlet Allocation (LDA); Bagging; boosting;
D O I
10.3390/info12090360
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Ad hoc information retrieval (ad hoc IR) is a challenging task consisting of ranking text documents for bag-of-words (BOW) queries. Classic approaches based on query and document text vectors use term-weighting functions to rank the documents. Some of these methods' limitations consist of their inability to work with polysemic concepts. In addition, these methods introduce fake orthogonalities between semantically related words. To address these limitations, model-based IR approaches based on topics have been explored. Specifically, topic models based on Latent Dirichlet Allocation (LDA) allow building representations of text documents in the latent space of topics, the better modeling of polysemy and avoiding the generation of orthogonal representations between related terms. We extend LDA-based IR strategies using different ensemble strategies. Model selection obeys the ensemble learning paradigm, for which we test two successful approaches widely used in supervised learning. We study Boosting and Bagging techniques for topic models, using each model as a weak IR expert. Then, we merge the ranking lists obtained from each model using a simple but effective top-k list fusion approach. We show that our proposal strengthens the results in precision and recall, outperforming classic IR models and strong baselines based on topic models.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] A Simple Enhancement for Ad-hoc Information Retrieval via Topic Modelling
    Jian, Fanghong
    Huang, Jimmy Xiangji
    Zhao, Jiashu
    He, Tingting
    Hu, Po
    [J]. SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, : 733 - 736
  • [2] Topic based language models for ad hoc information retrieval
    Azzopardi, L
    Girolami, M
    van Rijsbergen, CJ
    [J]. 2004 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2004, : 3281 - 3286
  • [3] A Hybrid Model for Ad-hoc Information Retrieval
    Ye, Zheng
    Huang, Jimmy Xiangji
    Miao, Jun
    [J]. SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2012, : 1025 - 1026
  • [4] Topic signature language models for ad hoc retrieval
    Zhou, Xiaohua
    Hu, Xiaohua
    Zhang, Xiaodan
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (09) : 1276 - 1287
  • [5] Neural Ad-Hoc Retrieval Meets Open Information Extraction
    Vo, Duc-Thuan
    Zarrinkalam, Fattane
    Pham, Ba
    Arabzadeh, Negar
    Salamat, Sara
    Bagheri, Ebrahim
    [J]. ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT II, 2023, 13981 : 655 - 663
  • [6] Clusters, Language Models, and ad hoc Information Retrieval
    Kurland, Oren
    Lee, Lillian
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2009, 27 (03)
  • [7] A study on the use of stemming for monolingual ad-hoc Portuguese information retrieval
    Orengo, Viviane Moreira
    Buriol, Luciana S.
    Coelho, Alexandre Ramos
    [J]. EVALUATION OF MULTILINGUAL AND MULTI-MODAL INFORMATION RETRIEVAL, 2007, 4730 : 91 - +
  • [8] Conditional variational autoencoder for query expansion in ad-hoc information retrieval
    Ou, Wei
    Huynh, Van-Nam
    [J]. INFORMATION SCIENCES, 2024, 652
  • [9] MIRACLE Progress in Monolingual Information Retrieval at Ad-Hoc CLEF 2007
    Gonzalez-Cristobal, Jose-Carlos
    Goni-Menoyo, Jose Miguel
    Villena-Roman, Julio
    Lana-Serrano, Sara
    [J]. ADVANCES IN MULTILINGUAL AND MULTIMODAL INFORMATION RETRIEVAL, 2008, 5152 : 156 - +
  • [10] Ad-hoc Information Retrieval based on Boosted Latent Dirichlet Allocated Topics
    Mendoza, Marcelo
    Ormeno, Pablo
    Valle, Carlos
    [J]. 2018 37TH INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY (SCCC), 2018,