Parallel generation of inverted files for distributed text collections

被引:6
|
作者
Ribeiro-Neto, BA [1 ]
Kitajima, JP [1 ]
Navarro, G [1 ]
Ana, CRGS [1 ]
Ziviani, N [1 ]
机构
[1] Univ Fed Minas Gerais, Dept Comp Sci, Belo Horizonte, MG, Brazil
关键词
D O I
10.1109/SCCC.1998.730794
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We present a scalable algorithm for the parallel computation of inverted files for large text collections. The algorithm takes into account an environment of a high bandwidth network of workstations with a shared-nothing memory organization. The text collection is assumed to be evenly distributed among the disks of the various workstations. Compression is used to save space in main memory (where inverted lists are kept) and to save time when data have to be moved across the network. The algorithm average running cost is O(t/p) where t is the size of the whole text collection and p is the number of available processors. We implemented our algorithm and drew experimental results. In a 100 Mbits/s switched Ethernet network with 4 PentiumPro 200 megahertz, 128 megabytes RAM on each processor we were able to invert 2 gigabytes of TREC documents in IS minutes. Further we also proposed an analytical model for the algorithm execution time.
引用
收藏
页码:149 / 157
页数:5
相关论文
共 50 条
  • [1] Parallel methods for the generation of partitioned inverted files
    MacFarlane, A
    McCann, JA
    Robertson, SE
    ASLIB PROCEEDINGS, 2005, 57 (05): : 434 - 459
  • [2] Inverted files versus signature files for text indexing
    Zobel, J
    Moffat, A
    Ramamohanarao, K
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 1998, 23 (04): : 453 - 490
  • [3] Using inverted files to compress text
    Ristov, Strahil
    Journal of Computing and Information Technology, 2002, 10 (03) : 157 - 161
  • [4] Using inverted files to compress text
    Ristov, S
    ITI 2002: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES, 2002, : 443 - 447
  • [5] Inverted files for text search engines
    Zobel, Justin
    Moffat, Alistair
    ACM COMPUTING SURVEYS, 2006, 38 (02)
  • [6] Parallel search using partitioned inverted files
    MacFarlane, A
    McCann, JA
    Robertson, SE
    SPIRE 2000: SEVENTH INTERNATIONAL SYMPOSIUM ON STRING PROCESSING AND INFORMATION RETRIEVAL - PROCEEDINGS, 2000, : 209 - 220
  • [7] Parallel methods for the update of partitioned inverted files
    MacFarlane, A.
    McCann, J. A.
    Robertson, S. E.
    ASLIB PROCEEDINGS, 2007, 59 (4-5): : 367 - 396
  • [8] Distributed Clustering of Text Collections
    Zamora, Juan
    Allende-Cid, Hector
    Mendoza, Marcelo
    IEEE ACCESS, 2019, 7 : 155671 - 155685
  • [9] Efficient distributed algorithms to build inverted files
    Ribeiro-Neto, B
    Moura, ES
    Neubert, MS
    Ziviani, N
    SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, : 105 - 112
  • [10] Fast concurrency control for distributed inverted files
    Marín, M
    COMPUTATIONAL SCIENCE - ICCS 2005, PT 1, PROCEEDINGS, 2005, 3514 : 411 - 418