The Design and Implementation of a High-efficiency Distributed Web Crawler

被引:3
|
作者
Pu, Qiumei [1 ]
机构
[1] Minzu Univ China, Sch Informat Engn, Beijing 100081, Peoples R China
关键词
big data; distributed technology; web crawler;
D O I
10.1109/DASC-PICom-DataCom-CyberSciTec.2016.34
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the rapid development of the Internet, the amount of data on the Internet become more and more huge; and the website technology is constantly changing. Faced with the huge and complex data on the global Internet, how to crawl and use this information has become a major challenge. Traditional stand-alone web crawler is difficult to cope with the challenges brought by the rapid growth of information, and it is difficult to grab huge amounts of data quickly and effectively. In this paper, we research to use the distributed technology to design and implement an efficient, configurable, load balancing and scalable distributed web crawler system.
引用
收藏
页码:100 / 104
页数:5
相关论文
共 50 条
  • [31] UniCrawl: A Practical Geographically Distributed Web Crawler
    Le Quoc, Do
    Fetzer, Christof
    Felber, Pascal
    Riviere, Etienne
    Schiavoni, Valerio
    Sutra, Pierre
    [J]. 2015 IEEE 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, 2015, : 389 - 396
  • [32] A web crawler design for data mining
    Thelwall, M
    [J]. JOURNAL OF INFORMATION SCIENCE, 2001, 27 (05) : 319 - 325
  • [33] Design, Implementation, and Evaluation of High-Efficiency High-Power Radio-Frequency Inductors
    Bayliss, Roderick S., III
    Yang, Rachel S.
    Hanson, Alex J.
    Sullivan, Charles R.
    Perreault, David J.
    [J]. 2021 THIRTY-SIXTH ANNUAL IEEE APPLIED POWER ELECTRONICS CONFERENCE AND EXPOSITION (APEC 2021), 2021, : 881 - 888
  • [34] Design and Implementation of a Web Crawler System based on an Adaptive Page-Rank algorithm
    Zhang, Xin
    Cheng, Zhi
    Zhang, Chen
    [J]. 2020 3RD INTERNATIONAL CONFERENCE ON COMPUTER INFORMATION SCIENCE AND APPLICATION TECHNOLOGY (CISAT) 2020, 2020, 1634
  • [35] High-efficiency generator design validated
    不详
    [J]. AMERICAN CERAMIC SOCIETY BULLETIN, 2003, 82 (09): : 7 - 8
  • [36] A High-Efficiency RFID Middleware Design
    Chang, Teng-Hsun
    Chen, Nong-Kun
    Chen, Jiann-Liang
    [J]. JOURNAL OF INTERNET TECHNOLOGY, 2009, 10 (04): : 405 - 412
  • [37] Design of a high-efficiency magnetorheological valve
    Yoo, JH
    Wereley, NM
    [J]. JOURNAL OF INTELLIGENT MATERIAL SYSTEMS AND STRUCTURES, 2002, 13 (10) : 679 - 685
  • [38] TOPICS IN A HIGH-EFFICIENCY FEL DESIGN
    HO, AH
    PANTELL, RH
    FEINSTEIN, J
    [J]. NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTION A-ACCELERATORS SPECTROMETERS DETECTORS AND ASSOCIATED EQUIPMENT, 1992, 318 (1-3): : 758 - 764
  • [39] Thermocompressor Design and Operation for High-Efficiency
    Soucy, M.
    Timm, G. L.
    [J]. PULP & PAPER-CANADA, 2010, 111 (05) : 34 - 37
  • [40] The design and implementation of the crawler-Inar
    Ding, Yu-Xin
    Wang, Xiao-Long
    Lin, Le-Bin
    Zhang, Qi
    Wu, Yong-Hui
    [J]. PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 4527 - +