A web crawler design for data mining

被引:111
|
作者
Thelwall, M [1 ]
机构
[1] Wolverhampton Univ, Sch Comp & Informat Technol, Wolverhampton WV1 1SB, England
关键词
D O I
10.1177/016555150102700503
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The content of the web has increasingly become a focus for academic research. Computer programs are needed in order to conduct any large-scale processing of web pages, requiring the use of a web crawler at some stage in order to fetch the pages to be analysed. The processing of the text of web pages in order to extract information can be expensive in terms of processor time. Consequently a distributed design is proposed in order to effectively use idle computing resources and to help information scientists avoid the need to employ dedicated equipment. A system developed using the model is examined and the advantages and limitations of the approach are discussed.
引用
收藏
页码:319 / 325
页数:7
相关论文
共 50 条
  • [1] Design and Application of Intelligent Dynamic Crawler for Web Data Mining
    Zheng Guojun
    Jia Wenchao
    Shi Jihui
    Shi Fan
    Zhu Hao
    Liu Jiang
    [J]. 2017 32ND YOUTH ACADEMIC ANNUAL CONFERENCE OF CHINESE ASSOCIATION OF AUTOMATION (YAC), 2017, : 1098 - 1105
  • [2] Research on Web Data Mining Based on Topic Crawler
    Guo, Hongjian
    [J]. JOURNAL OF WEB ENGINEERING, 2021, 20 (04): : 1131 - 1143
  • [3] Feature evaluation for web crawler detection with data mining techniques
    Stevanovic, Dusan
    An, Aijun
    Vlajic, Natalija
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (10) : 8707 - 8717
  • [4] Information Mining System Design and Implementation Based on Web Crawler
    Lin, Shan
    Li, You-meng
    Li, Qing-cheng
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON SYSTEM OF SYSTEMS ENGINEERING (SOSE), 2008, : 100 - 104
  • [5] Design of the Distributed Web Crawler
    Chen, Xing
    Li, Weijiang
    Zhao, Tiejun
    Piao, Xinghai
    [J]. ADVANCED RESEARCH ON INDUSTRY, INFORMATION SYSTEMS AND MATERIAL ENGINEERING, PTS 1-7, 2011, 204-210 : 1454 - +
  • [6] Design of a Mobile Web Crawler for Hidden Web
    Kumar, Manish
    Bhatia, Rajesh
    [J]. 2016 3rd International Conference on Recent Advances in Information Technology (RAIT), 2016, : 186 - 190
  • [7] Accuracy Crawler: An Accurate Crawler for Deep Web Data Extraction
    Mishra, Prafful
    Khurana, Anshul
    [J]. 2018 INTERNATIONAL CONFERENCE ON CONTROL, POWER, COMMUNICATION AND COMPUTING TECHNOLOGIES (ICCPCCT), 2018, : 25 - 29
  • [8] Mining Techniques of XSS Vulnerabilities Based on Web Crawler
    Wan Fangfang
    Xie Xusheng
    [J]. MECHATRONICS ENGINEERING, COMPUTING AND INFORMATION TECHNOLOGY, 2014, 556-562 : 6290 - 6293
  • [9] COVID-19 web public opinion monitoring text data mining method based on web crawler
    Chen, Keming
    Wang, ChunPing
    [J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2021, 128 : 223 - 223
  • [10] Research of Web Information Mining by using Crawler Techniques
    Li, Qing-Cheng
    Lin, Shan
    Dong, Zhen-Hua
    [J]. 2008 INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, VOLS 1-4, 2008, : 1603 - 1607