The Design and Implementation of a High-efficiency Distributed Web Crawler

被引:3
|
作者
Pu, Qiumei [1 ]
机构
[1] Minzu Univ China, Sch Informat Engn, Beijing 100081, Peoples R China
关键词
big data; distributed technology; web crawler;
D O I
10.1109/DASC-PICom-DataCom-CyberSciTec.2016.34
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the rapid development of the Internet, the amount of data on the Internet become more and more huge; and the website technology is constantly changing. Faced with the huge and complex data on the global Internet, how to crawl and use this information has become a major challenge. Traditional stand-alone web crawler is difficult to cope with the challenges brought by the rapid growth of information, and it is difficult to grab huge amounts of data quickly and effectively. In this paper, we research to use the distributed technology to design and implement an efficient, configurable, load balancing and scalable distributed web crawler system.
引用
收藏
页码:100 / 104
页数:5
相关论文
共 50 条
  • [41] Design and Implementation of an Automatic Scanning Tool of SQL Injection Vulnerability Based on Web Crawler
    Lei, Xiaochun
    Qu, Jiashi
    Yao, Gang
    Chen, Junyan
    Shen, Xin
    [J]. SECURITY WITH INTELLIGENT COMPUTING AND BIG-DATA SERVICES, 2020, 895 : 481 - 488
  • [42] Design and Implementation of A Focused Crawler - TargetCrawler
    Feng Jian
    Chen Jing-zhou
    Cao Lei
    [J]. INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2014, 7 (04): : 149 - 156
  • [43] IglooG: A distributed web crawler based on grid service
    Liu, F
    Ma, FY
    Ye, YM
    Li, ML
    Yu, JD
    [J]. WEB TECHNOLOGIES RESEARCH AND DEVELOPMENT - APWEB 2005, 2005, 3399 : 207 - 216
  • [44] A Distributed Web Crawler Model based on Cloud Computing
    Yu, Jiankun
    Li, Mengrong
    Zhang, Dengyin
    [J]. PROCEEDINGS OF THE 2ND INFORMATION TECHNOLOGY AND MECHATRONICS ENGINEERING CONFERENCE (ITOEC 2016), 2016, 24 : 276 - 279
  • [45] Distributed high-performance web crawler based on peer-to-peer network
    Fei, L
    Ma, FY
    Ye, YM
    Li, ML
    Yu, JD
    [J]. PARALLEL AND DISTRIBUTED COMPUTING: APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2004, 3320 : 50 - 53
  • [46] A high-efficiency distributed amplifier by using varying impedance
    Huai, G
    Lin, JM
    Wu, HD
    Shui, YG
    [J]. MICROWAVE AND OPTICAL TECHNOLOGY LETTERS, 2000, 26 (05) : 339 - 341
  • [47] Design and implementation of an administration system for distributed web server
    Yang, CS
    Luo, MY
    [J]. PROCEEDINGS OF THE TWELFTH SYSTEMS ADMINISTRATION CONFERENCE (LISA XII), 1998, : 131 - 139
  • [48] A full distributed Web crawler based on structured network
    Zhu, Kunpeng
    Xu, Zhiming
    Wang, Xiaolong
    Zhao, Yuming
    [J]. INFORMATION RETRIEVAL TECHNOLOGY, 2008, 4993 : 478 - 483
  • [49] Design and implementation of distributed RTI based on Web Services
    Zhou, Xin
    Wei, Jun-Hu
    Li, Peng
    Su, Qin
    [J]. Xitong Fangzhen Xuebao / Journal of System Simulation, 2008, 20 (08): : 2064 - 2067
  • [50] A high-efficiency Data Distribution Algorithm in Distributed Storage
    Yang, Xiao-yuan
    Liu, Zhen
    Zhang, Wei
    Guo, Dun-Tao
    [J]. FIFTH INTERNATIONAL CONFERENCE ON INFORMATION ASSURANCE AND SECURITY, VOL 1, PROCEEDINGS, 2009, : 627 - 630