Information Mining System Design and Implementation Based on Web Crawler

被引:0
|
作者
Lin, Shan [1 ]
Li, You-meng [1 ]
Li, Qing-cheng [1 ]
机构
[1] Nankai Univ, Coll Informat Tech Sci, Tianjin 300072, Peoples R China
关键词
Crawler; information mining; RSS; low cost;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the information explosion causing by the World Wide Web in recent years, the issue of how to execute the enormous information efficiently at a reasonable lost has become the concern of information providers, service agencies and end users. When many research focus on how to design an efficient web crawler, we pay our attention to how to make the best of the result of web crawler. In this paper, we describe the design and implementation of an information mining system running on the results of web crawler to gain more metadata from unstructured documents for focused search (such as RSS search). We present the software architecture of the system, describe efficient techniques for achieving high performance and report preliminary experimental results to prove that this system can address the issue of robustness, flexibility and accuracy at a low cost.
引用
收藏
页码:100 / 104
页数:5
相关论文
共 50 条
  • [1] The Design and Implementation of Configurable News Collection System Based On Web Crawler
    Lu, Mengmeng
    Wen, Shuhong
    Xiao, Yan
    Tian, Pei
    Wang, Fang
    [J]. PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 2812 - 2816
  • [2] A web crawler design for data mining
    Thelwall, M
    [J]. JOURNAL OF INFORMATION SCIENCE, 2001, 27 (05) : 319 - 325
  • [3] Design and Implementation of a Web Crawler System based on an Adaptive Page-Rank algorithm
    Zhang, Xin
    Cheng, Zhi
    Zhang, Chen
    [J]. 2020 3RD INTERNATIONAL CONFERENCE ON COMPUTER INFORMATION SCIENCE AND APPLICATION TECHNOLOGY (CISAT) 2020, 2020, 1634
  • [4] Design and Implementation of a Scalable Distributed Web Crawler Based on Hadoop
    Shi, YuLiang
    Zhang, Ti
    [J]. 2017 IEEE 2ND INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA), 2017, : 537 - 541
  • [5] Design of Analysis System for Documents Based on Web Crawler
    Shang, Jingtao
    Lin, Jianjun
    Qin, Van
    Li, Bo
    Wu, Mengmeng
    [J]. 2016 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2016, : 289 - 293
  • [6] Design of an Information Intelligent System based on Web Data Mining
    Zhang, Xinlin
    Yin, Xiangdong
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, 2008, : 88 - 91
  • [7] Design and Implementation of Distributed Crawler System Based on Scrapy
    Fan, Yuhao
    [J]. 2017 3RD INTERNATIONAL CONFERENCE ON ENVIRONMENTAL SCIENCE AND MATERIAL APPLICATION (ESMA2017), VOLS 1-4, 2018, 108
  • [8] Research of Web Information Mining by using Crawler Techniques
    Li, Qing-Cheng
    Lin, Shan
    Dong, Zhen-Hua
    [J]. 2008 INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, VOLS 1-4, 2008, : 1603 - 1607
  • [9] Design and Implementation of WEB Log Mining System
    Ni, Xianjun
    [J]. 2009 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND TECHNOLOGY, VOL II, PROCEEDINGS, 2009, : 425 - 427
  • [10] Design and Implementation of a Web-Based Faculty Information System
    Franco, Geanne Ross L.
    [J]. TENCON 2015 - 2015 IEEE REGION 10 CONFERENCE, 2015,