The Design and Implementation of a High-efficiency Distributed Web Crawler

被引:3
|
作者
Pu, Qiumei [1 ]
机构
[1] Minzu Univ China, Sch Informat Engn, Beijing 100081, Peoples R China
关键词
big data; distributed technology; web crawler;
D O I
10.1109/DASC-PICom-DataCom-CyberSciTec.2016.34
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the rapid development of the Internet, the amount of data on the Internet become more and more huge; and the website technology is constantly changing. Faced with the huge and complex data on the global Internet, how to crawl and use this information has become a major challenge. Traditional stand-alone web crawler is difficult to cope with the challenges brought by the rapid growth of information, and it is difficult to grab huge amounts of data quickly and effectively. In this paper, we research to use the distributed technology to design and implement an efficient, configurable, load balancing and scalable distributed web crawler system.
引用
收藏
页码:100 / 104
页数:5
相关论文
共 50 条
  • [1] Design and implementation of a high-performance distributed web crawler
    Shkapenyuk, V
    Suel, T
    [J]. 18TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2002, : 357 - 368
  • [2] Design and implementation of a full distributed web crawler
    Zhu, Kunpeng
    Wang, Xiaolong
    Liu, Yuanchao
    [J]. Journal of Computational Information Systems, 2009, 5 (04): : 1081 - 1088
  • [3] A distributed Web Crawler design and Java']Java implementation
    Ma, FY
    Zhang, L
    Ye, YM
    Yu, S
    Song, H
    [J]. WORLD WIDE WEB TECHNOLOGIES IN CHINA: RESEARCH, DEVELOPMENT, AND APPLICATIONS, 2002, : 36 - 49
  • [4] Design and Implementation of a Scalable Distributed Web Crawler Based on Hadoop
    Shi, YuLiang
    Zhang, Ti
    [J]. 2017 IEEE 2ND INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA), 2017, : 537 - 541
  • [5] Design of the Distributed Web Crawler
    Chen, Xing
    Li, Weijiang
    Zhao, Tiejun
    Piao, Xinghai
    [J]. ADVANCED RESEARCH ON INDUSTRY, INFORMATION SYSTEMS AND MATERIAL ENGINEERING, PTS 1-7, 2011, 204-210 : 1454 - +
  • [6] Implementation of A Distributed Web Community Crawler
    Park, Seonyoung
    Lee, Youngseok
    [J]. 2014 16TH ASIA-PACIFIC NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM (APNOMS), 2014,
  • [7] Design and implementation of a distributed crawler and filtering processor
    Zeinalipour-Yazti, D
    Dikaiakos, M
    [J]. NEXT GENERATION INFORMATION TECHNOLOGIES AND SYSTEMS, 2002, 2382 : 58 - 74
  • [8] IMPLEMENTATION OF WEB CRAWLER
    Gupta, Pooja
    Johari, Kalpana
    [J]. 2009 SECOND INTERNATIONAL CONFERENCE ON EMERGING TRENDS IN ENGINEERING AND TECHNOLOGY (ICETET 2009), 2009, : 775 - 780
  • [9] Design and Implementation of Distributed Crawler System Based on Scrapy
    Fan, Yuhao
    [J]. 2017 3RD INTERNATIONAL CONFERENCE ON ENVIRONMENTAL SCIENCE AND MATERIAL APPLICATION (ESMA2017), VOLS 1-4, 2018, 108
  • [10] Design and Implementation of Competent Web Crawler and Indexer Using Web Services
    Kumar, Santhosh D. K.
    Kamath, Manjunath
    [J]. 2014 INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION CONTROL AND COMPUTING TECHNOLOGIES (ICACCCT), 2014, : 1672 - 1677