Spectral clustering and query expansion using embeddings on the graph-based extension of the set-based information retrieval model

被引:0
|
作者
Kalogeropoulos, Nikitas-Rigas [1 ]
Kontogiannis, George [1 ]
Makris, Christos [1 ]
机构
[1] Comp Engn & Informat Dept, Univ Campus, Patras 26504, Achaia, Greece
关键词
Information retrieval; Information retrieval models; Set-based model; Graphical representation of textual data; Clustering; Spectral clustering; Graph embeddings;
D O I
10.1016/j.eswa.2024.125771
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a straightforward yet novel approach to enhance graph-based information retrieval models, by calibrating the relationships between node terms, leading to better evaluation metrics at the retrieval phase, and by reducing the total size of the graph. This is achieved by integrating spectral clustering, embedding-based graph pruning and term re-weighting. Spectral clustering assigns each term to a specific cluster, allowing us to propose two pruning methods: out-cluster and in-cluster pruning based on node similarities. In-cluster pruning refers to pruning edges between terms within the same cluster, while out-cluster pruning refers to edges that connect different clusters. Both methods utilize spectral embeddings to assess node similarities, resulting in more manageable clusters termed concepts. These concepts are likely to contain semantically similar terms, with each term's concept defined as the centroid of its cluster. We show that this graph pruning strategy significantly enhances the performance and effectiveness of the overall model, reducing, at the same time, its graph sparsity. Moreover, during the retrieval phase, the conceptually calibrated centroids are used to re-weight terms generated by user queries, and the precomputed embeddings enable efficient query expansion through a k-Nearest Neighbors (K-NN) approach, offering substantial enhancement with minimal additional time cost. To the best of our knowledge, this is the first application of spectral clustering and embedding-based conceptualization to prune graph-based IR models. Our approach enhances both retrieval efficiency and performance while enabling effective query expansion with minimal additional computational overhead. Our proposed technique is applied across various graph-based information retrieval models, improving evaluation metrics and producing sparser graphs.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Graph-based Knowledge Representation Model and Pattern Retrieval
    Qu, Qiang
    Qiu, Jiangnan
    Sun, Chenyan
    Wang, Yanzhang
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 5, PROCEEDINGS, 2008, : 541 - +
  • [32] Graph-based Keyphrase Extraction Using Word and Document Embeddings
    Zu, Xian
    Xie, Fei
    Liu, Xiaojian
    11TH IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH (ICKG 2020), 2020, : 70 - 76
  • [33] Structural Information Retrieval in XML Documents: A Graph-based Approach
    Belahyane, Imane
    Mammass, Mouad
    Abioui, Hasna
    Moutaoukkil, Assmaa
    Idarrou, Ali
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (03) : 654 - 659
  • [34] Graph-based characteristic view set extraction and matching for 3D model retrieval
    Liu, Anan
    Wang, Zhongyang
    Nie, Weizhi
    Su, Yuting
    INFORMATION SCIENCES, 2015, 320 : 429 - 442
  • [35] Graph-Based Audience Expansion Model for Marketing Campaigns
    Rahman, Md Mostafizur
    Kikuta, Daisuke
    Hirate, Yu
    Suzumura, Toyotaro
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2970 - 2975
  • [36] Query Expansion Using Semantic Pruning in Language Model for Information Retrieval
    Tu, Wei
    Gan, Lixin
    Xie, Zhihua
    PATTERN RECOGNITION, 2012, 321 : 671 - 679
  • [37] Cluster-based query expansion using external collections in medical information retrieval
    Oh, Heung-Seon
    Jung, Yuchul
    JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 58 : 70 - 79
  • [38] Machine Learning and Ontology based Framework in Information Retrieval using Semantic Query Expansion
    Deshmukh, Rupali R.
    Raut, Dr Anjali B.
    2024 4TH INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND SOCIAL NETWORKING, ICPCSN 2024, 2024, : 335 - 341
  • [39] Botnet detection using graph-based feature clustering
    Chowdhury S.
    Khanzadeh M.
    Akula R.
    Zhang F.
    Zhang S.
    Medal H.
    Marufuzzaman M.
    Bian L.
    Journal of Big Data, 4 (1)
  • [40] Using value of information in quantitative set-based design
    Shallcross, Nicholas J.
    Parnell, Gregory S.
    Pohl, Ed
    Goerger, Simon R.
    SYSTEMS ENGINEERING, 2021, 24 (06) : 439 - 455