Parallel generation of inverted files for distributed text collections

被引:6
|
作者
Ribeiro-Neto, BA [1 ]
Kitajima, JP [1 ]
Navarro, G [1 ]
Ana, CRGS [1 ]
Ziviani, N [1 ]
机构
[1] Univ Fed Minas Gerais, Dept Comp Sci, Belo Horizonte, MG, Brazil
关键词
D O I
10.1109/SCCC.1998.730794
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We present a scalable algorithm for the parallel computation of inverted files for large text collections. The algorithm takes into account an environment of a high bandwidth network of workstations with a shared-nothing memory organization. The text collection is assumed to be evenly distributed among the disks of the various workstations. Compression is used to save space in main memory (where inverted lists are kept) and to save time when data have to be moved across the network. The algorithm average running cost is O(t/p) where t is the size of the whole text collection and p is the number of available processors. We implemented our algorithm and drew experimental results. In a 100 Mbits/s switched Ethernet network with 4 PentiumPro 200 megahertz, 128 megabytes RAM on each processor we were able to invert 2 gigabytes of TREC documents in IS minutes. Further we also proposed an analytical model for the algorithm execution time.
引用
收藏
页码:149 / 157
页数:5
相关论文
共 50 条
  • [31] Distributed parallel Delaunay mesh generation
    Said, R
    Weatherill, NP
    Morgan, K
    Verhoeven, NA
    COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 1999, 177 (1-2) : 109 - 125
  • [32] Parallel distributed graphics generation system
    Hsi An Chiao Tung Ta Hsueh, 9 (24-30):
  • [33] MULTIPLE GENERATION TEXT FILES USING OVERLAPPING TREE-STRUCTURES
    BURTON, FW
    HUNTBACH, MM
    KOLLIAS, JG
    COMPUTER JOURNAL, 1985, 28 (04): : 414 - 416
  • [34] Distributed Generation With Parallel Connected Inverter
    Younis, M. A. A.
    Rahim, N. A.
    Mekhilef, S.
    ICIEA: 2009 4TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS, VOLS 1-6, 2009, : 2926 - +
  • [35] The generation and persistence of inferences in a distributed text memory
    Schmalhofer, F
    INTERNATIONAL JOURNAL OF PSYCHOLOGY, 1996, 31 (3-4) : 2135 - 2135
  • [36] Small files access efficiency in hadoop distributed file system a case study performed on British library text files
    Neeta Alange
    P. Vidya Sagar
    Cluster Computing, 2023, 26 : 3381 - 3388
  • [37] Small files access efficiency in hadoop distributed file system a case study performed on British library text files
    Alange, Neeta
    Sagar, P. Vidya
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2023, 26 (06): : 3381 - 3388
  • [38] Comparing inverted files and signature files for searching a large lexicon
    Carterette, B
    Can, F
    INFORMATION PROCESSING & MANAGEMENT, 2005, 41 (03) : 613 - 633
  • [39] A DISTRIBUTED LOAD-BUILDING ALGORITHM FOR PARALLEL COMPILATION OF FILES IN A SOFTWARE APPLICATION
    HAC, A
    LOKA, RR
    JOURNAL OF SYSTEMS AND SOFTWARE, 1994, 26 (02) : 167 - 177
  • [40] Offline text message storage and recovery based on inverted files in peer-to-peer internet telephony
    Wu ZhongXin
    Qian DePei
    2006 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-4, 2006, : 1577 - 1580