Load Balancing using Consistent Hashing: a Real Challenge for Large Scale Distributed Web Crawlers

被引:3
|
作者
Nasri, Mitra [1 ]
Sharifi, Mohsen [1 ]
机构
[1] Iran Univ Sci & Technol, Dept Comp Engn, Tehran, Iran
关键词
D O I
10.1109/WAINA.2009.96
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Large scale search engines nowadays use distributed Web crawlers to collect Web pages because it is impractical for a single machine to download the entire Web. Load balancing of such crawlers is an important task because of limitations in memory/resources of each crawling machine. Existing distributed crawlers use simple URL hashing based on site names as their partitioning policy. This can be done in a distributed environment using consistent hashing to dynamically manage joining and leaving of crawling nodes. This method is formally claimed to be load balanced in cases that hashing method is uniform. Given that the Web structure abides by power law distribution according to existing statistics, we argue that it is not at all possible for a uniform random hash function based on site's URL to be load balanced for case of large scale distributed Web crawlers. We show the truth of this claim by applying Web statistics to consistent hashing as it is used in one of famous Web crawlers. We also report some experimental results to demonstrate the effect of load balancing when we just rely on hash of host names.
引用
收藏
页码:715 / 720
页数:6
相关论文
共 50 条
  • [1] Adaptive Load-Balancing for Consistent Hashing in Heterogeneous Clusters
    Srinivasan, Lakshminarayanan
    Varma, Vasudeva
    2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 1135 - 1138
  • [2] A load balancing strategy for large-scale distributed computing
    Yang, Ji-Xiang
    Tan, Guo-Zhen
    Wang, Fan
    Zhou, Mei-Na
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2012, 40 (11): : 2226 - 2231
  • [3] Load Balancing in Distributed Web Caching
    Rajeev, Tiwari
    Gulista, Khan
    RECENT TRENDS IN NETWORK SECURITY AND APPLICATIONS, 2010, 89 : 47 - +
  • [4] VPCH: A Consistent Hashing Algorithm for Better Load Balancing in a Hadoop Environment
    Liu, Qi
    Cai, Weidong
    Shen, Jian
    Wang, Baowei
    Fu, Zhangjie
    Linge, Nigel
    2015 THIRD INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA, 2015, : 69 - 72
  • [5] Benchmarking Hashing Algorithms for Load Balancing in a Distributed Database Environment
    Slesarev, Alexander
    Mikhailov, Mikhail
    Chernishev, George
    ADVANCES IN MODEL AND DATA ENGINEERING IN THE DIGITALIZATION ERA, MEDI 2022, 2022, 1751 : 105 - 118
  • [6] An Adaptive Dynamic Load Balancing For Large Scale Distributed and Virtual Simulations
    Boukerche, Azzedine
    Zhang, Ming
    Xie, Hengheng
    2009 IEEE INTERNATIONAL CONFERENCE ON VIRTUAL ENVIRONMENTS, HUMAN-COMPUTER INTERFACES AND MEASUREMENT SYSTEMS, 2009, : 96 - 100
  • [7] UCYMICRA: Distributed indexing of the Web using migrating crawlers
    Papapetrou, O
    Papastavrou, S
    Samaras, G
    ADVANCES IN DATABASES AND INFORMATION SYSTEMS, PROCEEDINGS, 2003, 2798 : 133 - 147
  • [8] Research on Load Balancing in Distributed Web System
    Chen, Mingshu
    Quan, Hongwei
    2017 6TH INTERNATIONAL CONFERENCE ON APPLIED SOCIAL SCIENCE (ICASS 2017), PT 1, 2017, 97 : 537 - 540
  • [9] Implementation of load balancing in distributed Web server
    Wang, Jianqiu
    Zhang, Zhongneng
    Jisuanji Gongcheng/Computer Engineering, 2003, 29 (15):
  • [10] Scalable load balancing on distributed web servers using mobile agents
    Cao, JN
    Sun, YD
    Wang, XB
    Das, SK
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2003, 63 (10) : 996 - 1005