Query-driven indexing for scalable peer-to-peer text retrieval

被引:11
|
作者
Skobeltsyn, Gleb [1 ]
Luu, Toan
Zarko, Ivana Podnar [2 ]
Rajman, Martin
Aberer, Karl
机构
[1] Ecole Polytech Fed Lausanne, Sch Comp & Commun Sci, IC, CH-1015 Lausanne, Switzerland
[2] Univ Zagreb, Fac Elect Engn & Comp, Zagreb 41000, Croatia
关键词
P2P; DHT; IR; Text retrieval; P2PIR; Scalability; Query-driven indexing; Distributed index; Index updates;
D O I
10.1016/j.future.2008.03.006
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we present a query-driven indexing/retrieval strategy for efficient full text retrieval from large document collections distributed within a structured P2P network. Our indexing strategy is based on two important properties: (1) the generated distributed index stores posting lists for carefully chosen indexing term combinations that are frequently present in user queries, and (2) the posting lists containing too many document references are truncated to a bounded number of their top-ranked elements. These two properties guarantee acceptable latency and bandwidth requirements, essentially because the number of indexing term combinations remains scalable and the posting lists transmitted during retrieval never exceed a constant size. A novel index update mechanism efficiently handles adding of new documents to the document collection. Thus, the generated distributed index corresponds to a constantly evolving query-driven indexing structure that efficiently follows current information needs of the users and changes in the document collection. We show that the size of the index and the generated indexing/retrieval traffic remains manageable even for Web-size document collections at the price of a marginal loss in precision for rare queries. Our theoretical analysis and experimental results provide convincing evidence about the feasibility of the query-driven indexing strategy for large scale P2P text retrieval. (c) 2008 Elsevier B.V. All rights reserved.
引用
收藏
页码:89 / 99
页数:11
相关论文
共 50 条
  • [1] Scalable peer-to-peer RDF query algorithm
    Ranger, D
    Cloutier, JF
    [J]. WEB INFORMATION SYSTEMS ENGINEERING - WISE 2005 WORKSHOPS, PROCEEDINGS, 2005, 3807 : 266 - 274
  • [2] Indexing techniques for file sharing in scalable peer-to-peer networks
    Annexstein, FS
    Berman, KA
    Jovanovic, MA
    Ponnavaikko, K
    [J]. ELEVENTH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS, PROCEEDINGS, 2002, : 10 - 15
  • [3] Scalable retrieval and mining with optimal peer-to-peer configuration
    Chen, Jiann-Jone
    Hu, Chia-Jung
    Su, Chun-Rong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2008, 10 (02) : 209 - 220
  • [4] A scalable peer-to-peer system for music information retrieval
    Tzanetakis, G
    Gao, J
    Steenkiste, P
    [J]. COMPUTER MUSIC JOURNAL, 2004, 28 (02) : 24 - 33
  • [5] Efficient and scalable query routing for unstructured peer-to-peer networks
    Kumar, A
    Xu, J
    Zegura, EW
    [J]. IEEE Infocom 2005: The Conference on Computer Communications, Vols 1-4, Proceedings, 2005, : 1162 - 1173
  • [6] A peer-to-peer based text sharing and retrieval system
    Jiang, Qinliang
    Guan, Jihong
    [J]. PROCEEDINGS OF FUTURE GENERATION COMMUNICATION AND NETWORKING, WORKSHOP PAPERS, VOL 2, 2007, : 338 - +
  • [7] Scalable peer-to-peer web retrieval with highly discriminative keys
    Podnar, Ivana
    Rajman, Martin
    Luu, Toan
    Klemm, Fabius
    Aberer, Karl
    [J]. 2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2007, : 1071 - +
  • [8] Exploiting locality for scalable information retrieval in peer-to-peer networks
    Zeinalipour-Yazti, D
    Kalogeraki, V
    Gunopulos, D
    [J]. INFORMATION SYSTEMS, 2005, 30 (04) : 277 - 298
  • [9] Scalable peer-to-peer file sharing with efficient complex query support
    Li, Yan
    Ahuja, Jyoti
    Lao, Li
    Cui, Jun-Hong
    [J]. PROCEEDINGS - 16TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS, VOLS 1-3, 2007, : 121 - +
  • [10] Content-based retrieval of music in scalable peer-to-peer networks
    Gao, J
    Tzanetakis, G
    Steenkiste, P
    [J]. 2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I, PROCEEDINGS, 2003, : 309 - 312