Incremental cluster-based retrieval using compressed cluster-skipping inverted files

被引:22
|
作者
Altingovde, Ismail Sengor [1 ]
Demir, Engin [1 ]
Can, Fazli [1 ]
Ulusoy, Oezguer [1 ]
机构
[1] Bilkent Univ, Dept Comp Engn, TR-06800 Ankara, Turkey
关键词
experimentation; measurement; performance; best match; cluster-based retrieval (CBR); cluster-skipping inverted index structure (CS-IIS); full search (FS); index compression; inverted index structure (IIS); query processing;
D O I
10.1145/1361684.1361688
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a unique cluster-based retrieval (CBR) strategy using a new cluster-skipping inverted file for improving query processing efficiency. The new inverted file incorporates cluster membership and centroid information along with the usual document information into a single structure. In our incremental-CBR strategy, during query evaluation, both best(-matching) clusters and the best(-matching) documents of such clusters are computed together with a single posting-list access per query term. As we switch from term to term, the best clusters are recomputed and can dynamically change. During query-document matching, only relevant portions of the posting lists corresponding to the best clusters are considered and the rest are skipped. The proposed approach is essentially tailored for environments where inverted files are compressed, and provides substantial efficiency improvement while yielding comparable, or sometimes better, effectiveness figures. Our experiments with various collections show that the incremental- CBR strategy using a compressed cluster-skipping inverted file significantly improves CPU time efficiency, regardless of query length. The new compressed inverted file imposes an acceptable storage overhead in comparison to a typical inverted file. We also show that our approach scales well with the collection size.
引用
收藏
页数:36
相关论文
共 50 条
  • [31] CLUE: Cluster-based retrieval of images by unsupervised learning
    Chen, YX
    Wang, JZ
    Krovetz, R
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2005, 14 (08) : 1187 - 1201
  • [32] Cluster-based Partial Dense Retrieval Fused with Sparse Text Retrieval
    Yang, Yingrui
    Carlson, Parker
    He, Shanxiu
    Qiao, Yifan
    Yang, Tao
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2327 - 2331
  • [33] Cluster-based query expansion using external collections in medical information retrieval
    Oh, Heung-Seon
    Jung, Yuchul
    JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 58 : 70 - 79
  • [34] Cluster-Based News Representative Generation with Automatic Incremental Clustering
    Shabirin, Irsal
    Barakbah, Ali Ridho
    Syarif, Iwan
    EMITTER-INTERNATIONAL JOURNAL OF ENGINEERING TECHNOLOGY, 2019, 7 (02) : 467 - 479
  • [35] Fast and effective cluster-based information retrieval using frequent closed itemsets
    Djenouri, Youcef
    Belhadi, Asma
    Fournier-Viger, Philippe
    Lin, Jerry Chun-Wei
    INFORMATION SCIENCES, 2018, 453 : 154 - 167
  • [36] Interactive Cluster-Based Personalized Retrieval on Large Document Collections
    Belsis, Petros
    Konstantopoulos, Charalampos
    Mamalis, Basilis
    Pantzioul, Grarnmati
    Skourlas, Christos
    NEW DIRECTIONS IN INTELLIGENT INTERACTIVE MULTIMEDIA, 2008, 142 : 211 - +
  • [37] Incremental transitivity applied to cluster retrieval
    Hasan, Yaser
    Hassan, Muhammad
    Ridley, Mick
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2008, 5 (03) : 311 - 319
  • [38] Hybrid Indexing for Versioned Document Search with Cluster-based Retrieval
    Jin, Xin
    Agun, Daniel
    Yang, Tao
    Wu, Qinghao
    Shen, Yifan
    Zhao, Susen
    CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 377 - 386
  • [39] SOPHIA: An interactive cluster-based retrieval system for the OHSUMED collection
    Dobrynin, V
    Patterson, D
    Galushka, M
    Rooney, N
    IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, 2005, 9 (02): : 256 - 265
  • [40] Cluster-based polyrepresentation as science modelling approach for information retrieval
    Abbasi, Muhammad Kamran
    Frommholz, Ingo
    SCIENTOMETRICS, 2015, 102 (03) : 2301 - 2322