Incremental cluster-based retrieval using compressed cluster-skipping inverted files

被引:22
|
作者
Altingovde, Ismail Sengor [1 ]
Demir, Engin [1 ]
Can, Fazli [1 ]
Ulusoy, Oezguer [1 ]
机构
[1] Bilkent Univ, Dept Comp Engn, TR-06800 Ankara, Turkey
关键词
experimentation; measurement; performance; best match; cluster-based retrieval (CBR); cluster-skipping inverted index structure (CS-IIS); full search (FS); index compression; inverted index structure (IIS); query processing;
D O I
10.1145/1361684.1361688
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a unique cluster-based retrieval (CBR) strategy using a new cluster-skipping inverted file for improving query processing efficiency. The new inverted file incorporates cluster membership and centroid information along with the usual document information into a single structure. In our incremental-CBR strategy, during query evaluation, both best(-matching) clusters and the best(-matching) documents of such clusters are computed together with a single posting-list access per query term. As we switch from term to term, the best clusters are recomputed and can dynamically change. During query-document matching, only relevant portions of the posting lists corresponding to the best clusters are considered and the rest are skipped. The proposed approach is essentially tailored for environments where inverted files are compressed, and provides substantial efficiency improvement while yielding comparable, or sometimes better, effectiveness figures. Our experiments with various collections show that the incremental- CBR strategy using a compressed cluster-skipping inverted file significantly improves CPU time efficiency, regardless of query length. The new compressed inverted file imposes an acceptable storage overhead in comparison to a typical inverted file. We also show that our approach scales well with the collection size.
引用
收藏
页数:36
相关论文
共 50 条
  • [21] Multimedia Information Retrieval Using Fuzzy Cluster-Based Model Learning
    Sattari, Saeid
    Yazici, Adnan
    2017 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2017,
  • [22] An Approach for Cluster-Based Retrieval of Tests Using Cover-Coefficients
    Subramaniam, Mahadevan
    Chundi, Parvathi
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2015, 25 (06) : 1033 - 1052
  • [23] Cluster-based patent retrieval using international patent classification system
    Kim, Jungi
    Kang, In-Su
    Lee, Jong-Hyeok
    COMPUTER PROCESSING OF ORIENTAL LANGUAGES, PROCEEDINGS: BEYOND THE ORIENT: THE RESEARCH CHALLENGES AHEAD, 2006, 4285 : 205 - +
  • [24] Algorithms for within-cluster searches using inverted files
    Altingovde, Ismail Sengor
    Can, Fazli
    Ulusoy, Ozgur
    COMPUTER AND INFORMATION SCIENCES - ISCIS 2006, PROCEEDINGS, 2006, 4263 : 707 - +
  • [25] Incremental anomaly detection using two-layer cluster-based structure
    Bigdeli, Elnaz
    Mohammadi, Mandi
    Raahemi, Bijan
    Matwin, Stan
    INFORMATION SCIENCES, 2018, 429 : 315 - 331
  • [26] Efficiency and effectiveness of query processing in cluster-based retrieval
    Can, F
    Altingövde, IS
    Demir, E
    INFORMATION SYSTEMS, 2004, 29 (08) : 697 - 717
  • [27] Dynamic Cluster-based Retrieval and Discovery for Biomedical Literature
    Ortiz, Michael Segundo
    Kim, Heejun
    Wang, Mengqian
    Seki, Kazuhiro
    Mostafa, Javed
    ACM-BCB'19: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, 2019, : 390 - 396
  • [28] Structural re-ranking with cluster-based retrieval
    Na, Seung-Hoon
    Kang, In-Su
    Lee, Jong-Hyeok
    ADVANCES IN INFORMATION RETRIEVAL, 2008, 4956 : 658 - +
  • [29] Measuring the Effects of Summarization in Cluster-based Information Retrieval
    Curiel, Arturo
    Gutierrez-Soto, Claudio
    Soto-Borquez, Pablo-Nicolas
    Galdames, Patricio
    2020 39TH INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY (SCCC), 2020,
  • [30] Introducing an active cluster-based information retrieval paradigm
    Loureiro, O
    Siegelmann, H
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2005, 56 (10): : 1024 - 1030