Load Balancing using Consistent Hashing: a Real Challenge for Large Scale Distributed Web Crawlers

被引:3
|
作者
Nasri, Mitra [1 ]
Sharifi, Mohsen [1 ]
机构
[1] Iran Univ Sci & Technol, Dept Comp Engn, Tehran, Iran
关键词
D O I
10.1109/WAINA.2009.96
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Large scale search engines nowadays use distributed Web crawlers to collect Web pages because it is impractical for a single machine to download the entire Web. Load balancing of such crawlers is an important task because of limitations in memory/resources of each crawling machine. Existing distributed crawlers use simple URL hashing based on site names as their partitioning policy. This can be done in a distributed environment using consistent hashing to dynamically manage joining and leaving of crawling nodes. This method is formally claimed to be load balanced in cases that hashing method is uniform. Given that the Web structure abides by power law distribution according to existing statistics, we argue that it is not at all possible for a uniform random hash function based on site's URL to be load balanced for case of large scale distributed Web crawlers. We show the truth of this claim by applying Web statistics to consistent hashing as it is used in one of famous Web crawlers. We also report some experimental results to demonstrate the effect of load balancing when we just rely on hash of host names.
引用
收藏
页码:715 / 720
页数:6
相关论文
共 50 条
  • [21] DistCache: Provable Load Balancing for Large-Scale Storage Systems with Distributed Caching
    Liu, Zaoxing
    Bai, Zhihao
    Liu, Zhenming
    Li, Xiaozhou
    Kim, Changhoon
    Braverman, Vladimir
    Jin, Xin
    Stoica, Ion
    PROCEEDINGS OF THE 17TH USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES, 2019, : 143 - 157
  • [22] The Potential of Diffusive Load Balancing at Large Scale
    Lieber, Matthias
    Goessner, Kerstin
    Nagel, Wolfgang E.
    PROCEEDINGS OF THE 23RD EUROPEAN MPI USERS' GROUP MEETING (EUROMPI 2016), 2016, : 154 - 157
  • [23] Research Load Balancing Technology of Distributed Database Based on Consistent Hash
    Gong, Wenbo
    2019 THE 3RD INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPILATION, COMPUTING AND COMMUNICATIONS (HP3C 2019), 2019, : 95 - 99
  • [24] Dynamic Load Balancing Using Grid Services for HLA-Based Simulations on Large-Scale Distributed Systems
    Boukerche, Azzedine
    de Grande, Robson Eduardo
    13TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON DISTRIBUTED SIMULATION AND REAL-TIME APPLICATIONS, PROCEEDINGS, 2009, : 175 - 183
  • [25] Toolbox: Balancing Traffic Load in Globally Distributed Web Sites
    L. Lawrence Ho
    Journal of Network and Systems Management, 2000, 8 (2) : 305 - 308
  • [26] Dynamic load balancing in geographically distributed heterogeneous Web servers
    Colajanni, M
    Yu, PS
    Cardellini, V
    18TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, PROCEEDINGS, 1998, : 295 - 302
  • [27] ALBL: an adaptive load balancing algorithm for distributed web systems
    Kontogiannis, Sotirios
    Karakos, Alexandros
    INTERNATIONAL JOURNAL OF COMMUNICATION NETWORKS AND DISTRIBUTED SYSTEMS, 2014, 13 (02) : 144 - 168
  • [28] Load Balancing in Distributed Web Caching: A Novel Clustering Approach
    Tiwari, R.
    Kumar, K.
    Khan, G.
    INTERNATIONAL CONFERENCE ON METHODS AND MODELS IN SCIENCE AND TECHNOLOGY (ICM2ST-10), 2010, 1324 : 341 - +
  • [29] Enhancing Load Balancing Efficiency Based on Migration Delay for Large-Scale Distributed Simulations
    Alghamdi, Turki G.
    De Grande, Robson Eduardo
    Boukerche, Azzedine
    2015 IEEE/ACM 19TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED SIMULATION AND REAL TIME APPLICATIONS (DS-RT), 2015, : 33 - 40
  • [30] Supervised Distributed Hashing for Large-Scale Multimedia Retrieval
    Zhai, Deming
    Liu, Xianming
    Ji, Xiangyang
    Zhao, Debin
    Satoh, Shin'ichi
    Gao, Wen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (03) : 675 - 686