Incremental cluster-based retrieval using compressed cluster-skipping inverted files

被引:22
|
作者
Altingovde, Ismail Sengor [1 ]
Demir, Engin [1 ]
Can, Fazli [1 ]
Ulusoy, Oezguer [1 ]
机构
[1] Bilkent Univ, Dept Comp Engn, TR-06800 Ankara, Turkey
关键词
experimentation; measurement; performance; best match; cluster-based retrieval (CBR); cluster-skipping inverted index structure (CS-IIS); full search (FS); index compression; inverted index structure (IIS); query processing;
D O I
10.1145/1361684.1361688
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a unique cluster-based retrieval (CBR) strategy using a new cluster-skipping inverted file for improving query processing efficiency. The new inverted file incorporates cluster membership and centroid information along with the usual document information into a single structure. In our incremental-CBR strategy, during query evaluation, both best(-matching) clusters and the best(-matching) documents of such clusters are computed together with a single posting-list access per query term. As we switch from term to term, the best clusters are recomputed and can dynamically change. During query-document matching, only relevant portions of the posting lists corresponding to the best clusters are considered and the rest are skipped. The proposed approach is essentially tailored for environments where inverted files are compressed, and provides substantial efficiency improvement while yielding comparable, or sometimes better, effectiveness figures. Our experiments with various collections show that the incremental- CBR strategy using a compressed cluster-skipping inverted file significantly improves CPU time efficiency, regardless of query length. The new compressed inverted file imposes an acceptable storage overhead in comparison to a typical inverted file. We also show that our approach scales well with the collection size.
引用
收藏
页数:36
相关论文
共 50 条
  • [1] Exploiting Cluster-Skipping Inverted Index for Semantic Place Retrieval
    Cinar, Enes Recep
    Altingovde, Ismail Sengor
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 1981 - 1985
  • [2] An approach for document retrieval using cluster-based inverted indexing
    Chandwani, Gunjan
    Ahlawat, Anil
    Dubey, Gaurav
    JOURNAL OF INFORMATION SCIENCE, 2023, 49 (03) : 726 - 739
  • [3] Cluster-Based Focused Retrieval
    Sheetrit, Eilon
    Kurland, Oren
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2305 - 2308
  • [4] Cluster-based patent retrieval
    Kang, In-Su
    Na, Seung-Hoon
    Kim, Jungi
    Lee, Jong-Hyeok
    INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (05) : 1173 - 1182
  • [5] Cluster-based delta compression of a collection of files
    Ouyang, Z
    Memon, N
    Suel, T
    Trendafilov, D
    WISE 2002: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS ENGINEERING, 2002, : 257 - 266
  • [6] Cluster-based information retrieval using pattern mining
    Youcef Djenouri
    Asma Belhadi
    Djamel Djenouri
    Jerry Chun-Wei Lin
    Applied Intelligence, 2021, 51 : 1888 - 1903
  • [7] Cluster-based information retrieval using pattern mining
    Djenouri, Youcef
    Belhadi, Asma
    Djenouri, Djamel
    Lin, Jerry Chun-Wei
    APPLIED INTELLIGENCE, 2021, 51 (04) : 1888 - 1903
  • [8] Selective Cluster-Based Document Retrieval
    Levi, Or
    Raiber, Fiana
    Kurland, Oren
    Guy, Ido
    CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 1473 - 1482
  • [9] A novel cluster-based image retrieval
    Lotfy, HM
    Elmaghraby, AS
    Proceedings of the Fourth IEEE International Symposium on Signal Processing and Information Technology, 2004, : 338 - 341
  • [10] An incremental cluster-based approach to spam filtering
    Hsiao, Wen-Feng
    Chang, Te-Min
    EXPERT SYSTEMS WITH APPLICATIONS, 2008, 34 (03) : 1599 - 1608