Spectral clustering and query expansion using embeddings on the graph-based extension of the set-based information retrieval model

被引:0
|
作者
Kalogeropoulos, Nikitas-Rigas [1 ]
Kontogiannis, George [1 ]
Makris, Christos [1 ]
机构
[1] Comp Engn & Informat Dept, Univ Campus, Patras 26504, Achaia, Greece
关键词
Information retrieval; Information retrieval models; Set-based model; Graphical representation of textual data; Clustering; Spectral clustering; Graph embeddings;
D O I
10.1016/j.eswa.2024.125771
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a straightforward yet novel approach to enhance graph-based information retrieval models, by calibrating the relationships between node terms, leading to better evaluation metrics at the retrieval phase, and by reducing the total size of the graph. This is achieved by integrating spectral clustering, embedding-based graph pruning and term re-weighting. Spectral clustering assigns each term to a specific cluster, allowing us to propose two pruning methods: out-cluster and in-cluster pruning based on node similarities. In-cluster pruning refers to pruning edges between terms within the same cluster, while out-cluster pruning refers to edges that connect different clusters. Both methods utilize spectral embeddings to assess node similarities, resulting in more manageable clusters termed concepts. These concepts are likely to contain semantically similar terms, with each term's concept defined as the centroid of its cluster. We show that this graph pruning strategy significantly enhances the performance and effectiveness of the overall model, reducing, at the same time, its graph sparsity. Moreover, during the retrieval phase, the conceptually calibrated centroids are used to re-weight terms generated by user queries, and the precomputed embeddings enable efficient query expansion through a k-Nearest Neighbors (K-NN) approach, offering substantial enhancement with minimal additional time cost. To the best of our knowledge, this is the first application of spectral clustering and embedding-based conceptualization to prune graph-based IR models. Our approach enhances both retrieval efficiency and performance while enabling effective query expansion with minimal additional computational overhead. Our proposed technique is applied across various graph-based information retrieval models, improving evaluation metrics and producing sparser graphs.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Query expansion based on clustering and personalized information retrieval
    Khalifi, Hamid
    Cherif, Walid
    El Qadi, Abderrahim
    Ghanou, Youssef
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2019, 8 (02) : 241 - 251
  • [2] Query expansion based on clustering and personalized information retrieval
    Hamid Khalifi
    Walid Cherif
    Abderrahim El Qadi
    Youssef Ghanou
    Progress in Artificial Intelligence, 2019, 8 : 241 - 251
  • [3] Clustering Algorithms for Query Expansion Based Information Retrieval
    Khennak, Ilyes
    Drias, Habiba
    Kechid, Amine
    Moulai, Hadjer
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, PT II, 2019, 11684 : 261 - 272
  • [4] An information retrieval model based on query expansion
    Huang, Mingxuan
    Zhang, Shichao
    Yan, Xiaowei
    Huang, Faliang
    RECENT ADVANCE OF CHINESE COMPUTING TECHNOLOGIES, 2007, : 217 - 221
  • [5] Query Expansion based on Word Embeddings and Ontologies for Efficient Information Retrieval
    Rastogi, Namrata
    Verma, Parul
    Kumar, Pankaj
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (11) : 367 - 373
  • [6] Ontology Graph based Query Expansion for Biomedical Information Retrieval
    Dong, Liang
    Srimani, Pradip K.
    Wang, James Z.
    2011 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM 2011), 2011, : 488 - 493
  • [7] A Novel Information Retrieval Approach using Query Expansion and Spectral-based
    Alnofaie, Sara
    Dahab, Mohammed
    Kamal, Mahmoud
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (09) : 364 - 373
  • [8] A graph-based information retrieval system
    Thammasut, Duangjai
    Sornil, Ohm
    2006 INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES,VOLS 1-3, 2006, : 793 - +
  • [9] Comparison of set-based and graph-based visualizations of overlapping classification hierarchies
    Graham, Martin
    Kennedy, Jessie B.
    Hand, Chris
    Proceedings of the Workshop on Advanced Visual Interfaces, 2000, : 41 - 50
  • [10] Graph-based clustering of random point set
    Imiya, A
    Tatara, K
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, PROCEEDINGS, 2004, 3138 : 948 - 956