Spectral clustering and query expansion using embeddings on the graph-based extension of the set-based information retrieval model

被引:0
|
作者
Kalogeropoulos, Nikitas-Rigas [1 ]
Kontogiannis, George [1 ]
Makris, Christos [1 ]
机构
[1] Comp Engn & Informat Dept, Univ Campus, Patras 26504, Achaia, Greece
关键词
Information retrieval; Information retrieval models; Set-based model; Graphical representation of textual data; Clustering; Spectral clustering; Graph embeddings;
D O I
10.1016/j.eswa.2024.125771
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a straightforward yet novel approach to enhance graph-based information retrieval models, by calibrating the relationships between node terms, leading to better evaluation metrics at the retrieval phase, and by reducing the total size of the graph. This is achieved by integrating spectral clustering, embedding-based graph pruning and term re-weighting. Spectral clustering assigns each term to a specific cluster, allowing us to propose two pruning methods: out-cluster and in-cluster pruning based on node similarities. In-cluster pruning refers to pruning edges between terms within the same cluster, while out-cluster pruning refers to edges that connect different clusters. Both methods utilize spectral embeddings to assess node similarities, resulting in more manageable clusters termed concepts. These concepts are likely to contain semantically similar terms, with each term's concept defined as the centroid of its cluster. We show that this graph pruning strategy significantly enhances the performance and effectiveness of the overall model, reducing, at the same time, its graph sparsity. Moreover, during the retrieval phase, the conceptually calibrated centroids are used to re-weight terms generated by user queries, and the precomputed embeddings enable efficient query expansion through a k-Nearest Neighbors (K-NN) approach, offering substantial enhancement with minimal additional time cost. To the best of our knowledge, this is the first application of spectral clustering and embedding-based conceptualization to prune graph-based IR models. Our approach enhances both retrieval efficiency and performance while enabling effective query expansion with minimal additional computational overhead. Our proposed technique is applied across various graph-based information retrieval models, improving evaluation metrics and producing sparser graphs.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Private Information Retrieval in Graph-Based Replication Systems
    Raviv, Netanel
    Tamo, Itzhak
    Yaakobi, Eitan
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2020, 66 (06) : 3590 - 3602
  • [22] Graph-Based Natural Language Processing and Information Retrieval
    Biemann, Chris
    COMPUTATIONAL LINGUISTICS, 2012, 38 (01) : 219 - 221
  • [23] Enhanced Semantic Understanding with Graph-Based Information Retrieval
    De Filippis, Giovanni M.
    Rinaldi, Antonio M.
    Russo, Cristiano
    Tommasino, Cristian
    ADVANCES ON GRAPH-BASED APPROACHES IN INFORMATION RETRIEVAL, IRONGRAPHS 2024, 2025, 2197 : 11 - 24
  • [24] Graph-based natural language processing and information retrieval
    Tomas, David
    MACHINE TRANSLATION, 2012, 26 (03) : 277 - 280
  • [25] Query Expansion based on Concept Clique for Markov Network Information Retrieval Model
    Gan, Lixin
    Wang, Shengqian
    Wang, Mingwen
    Xiel, Zhihua
    Zhang, Lin
    Shu, Zhenghua
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 5, PROCEEDINGS, 2008, : 29 - +
  • [26] Recapitulization of Tweets Using Graph-based Clustering
    Lobo, Vivian Brian
    Ansari, Nazneen
    2017 2ND INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS, COMPUTING AND IT APPLICATIONS (CSCITA), 2017, : 101 - 106
  • [27] Ontology Based Query Expansion with a Probabilistic Retrieval Model
    Bhogal, Jagdev
    Macfarlane, Andrew
    MULTIDISCIPLINARY INFORMATION RETRIEVAL, 2013, 8201 : 5 - 16
  • [28] Research on Semantic Information Retrieval Model of Bamboo & Rattan Domain Based on Query Extension
    Peng, Lin
    Lai, Ming-ming
    Zhang, Xin
    2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
  • [29] Information-retrieval algorithm based on query expansion and classification
    Yue, Wen
    Chen, Zhi-Ping
    Lin, Ya-Ping
    Xitong Fangzhen Xuebao / Journal of System Simulation, 2006, 18 (07): : 1926 - 1929
  • [30] Ontology-based spatial query expansion in information retrieval
    Fu, GH
    Jones, CB
    Abdelmoty, AI
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2005: COOPIS, DOA, AND ODBASE, PT 2, PROCEEDINGS, 2005, 3761 : 1466 - 1482