Spectral clustering and query expansion using embeddings on the graph-based extension of the set-based information retrieval model

被引:0
|
作者
Kalogeropoulos, Nikitas-Rigas [1 ]
Kontogiannis, George [1 ]
Makris, Christos [1 ]
机构
[1] Comp Engn & Informat Dept, Univ Campus, Patras 26504, Achaia, Greece
关键词
Information retrieval; Information retrieval models; Set-based model; Graphical representation of textual data; Clustering; Spectral clustering; Graph embeddings;
D O I
10.1016/j.eswa.2024.125771
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a straightforward yet novel approach to enhance graph-based information retrieval models, by calibrating the relationships between node terms, leading to better evaluation metrics at the retrieval phase, and by reducing the total size of the graph. This is achieved by integrating spectral clustering, embedding-based graph pruning and term re-weighting. Spectral clustering assigns each term to a specific cluster, allowing us to propose two pruning methods: out-cluster and in-cluster pruning based on node similarities. In-cluster pruning refers to pruning edges between terms within the same cluster, while out-cluster pruning refers to edges that connect different clusters. Both methods utilize spectral embeddings to assess node similarities, resulting in more manageable clusters termed concepts. These concepts are likely to contain semantically similar terms, with each term's concept defined as the centroid of its cluster. We show that this graph pruning strategy significantly enhances the performance and effectiveness of the overall model, reducing, at the same time, its graph sparsity. Moreover, during the retrieval phase, the conceptually calibrated centroids are used to re-weight terms generated by user queries, and the precomputed embeddings enable efficient query expansion through a k-Nearest Neighbors (K-NN) approach, offering substantial enhancement with minimal additional time cost. To the best of our knowledge, this is the first application of spectral clustering and embedding-based conceptualization to prune graph-based IR models. Our approach enhances both retrieval efficiency and performance while enabling effective query expansion with minimal additional computational overhead. Our proposed technique is applied across various graph-based information retrieval models, improving evaluation metrics and producing sparser graphs.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Research on Bayesian Network Retrieval Model Based on Query Expansion
    Zhao, Shuang
    Wu, Hong-Xia
    Lin, Yong-Min
    EMERGING RESEARCH IN ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, 2012, 315 : 291 - 295
  • [42] Query Expansion Based on a Feedback Concept Model for Microblog Retrieval
    Wang, Yashen
    Huang, Heyan
    Feng, Chong
    PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'17), 2017, : 559 - 568
  • [43] An information retrieval system based on automatic query expansion and Hopfield network
    Sheng, XW
    Jiang, MH
    PROCEEDINGS OF 2003 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS & SIGNAL PROCESSING, PROCEEDINGS, VOLS 1 AND 2, 2003, : 1624 - 1627
  • [44] Design and implementation of ontology-based query expansion for information retrieval
    Fang Wu
    Guoshi Wu
    Xangling Fu
    RESEARCH AND PRACTICAL ISSUES OF ENTERPRISE INFORMATION SYSTEMS II, VOL 1, 2008, 254 : 293 - +
  • [45] An information retrieval system based on automatic query expansion and hopfield network
    Wang, Lin
    Jiang, Minghu
    Sheng, Xiaowei
    Lu, Yinghua
    DYNAMICS OF CONTINUOUS DISCRETE AND IMPULSIVE SYSTEMS-SERIES B-APPLICATIONS & ALGORITHMS, 2006, 13E : 1519 - 1524
  • [46] An improved VSM based information retrieval system and fuzzy query expansion
    Wu, JN
    Tanioka, H
    Wang, SZ
    Pan, DH
    Yamamoto, K
    Wang, ZT
    FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PT 1, PROCEEDINGS, 2005, 3613 : 537 - 546
  • [47] Design and implementation of ontology-based query expansion for information retrieval
    School of Software Engineering, Beijing University of Posts and Telecommunications, Beijing
    100879, China
    不详
    061001, China
    IFIP Advances in Information and Communication Technology, 2007, (293-298)
  • [48] QeCSO: Design of hybrid Cuckoo Search based Query expansion model for efficient information retrieval
    Lilian, J. Felicia
    Sundarakantham, K.
    Shalinie, S. Mercy
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2021, 46 (03):
  • [49] QeCSO: Design of hybrid Cuckoo Search based Query expansion model for efficient information retrieval
    J Felicia Lilian
    K Sundarakantham
    S Mercy Shalinie
    Sādhanā, 2021, 46
  • [50] Graph-based Retrieval Model for Semi-structured Data
    Park, Juneyoung
    Yi, Mun Y.
    2016 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2016, : 361 - 364