Information Mining System Design and Implementation Based on Web Crawler

被引:0
|
作者
Lin, Shan [1 ]
Li, You-meng [1 ]
Li, Qing-cheng [1 ]
机构
[1] Nankai Univ, Coll Informat Tech Sci, Tianjin 300072, Peoples R China
关键词
Crawler; information mining; RSS; low cost;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the information explosion causing by the World Wide Web in recent years, the issue of how to execute the enormous information efficiently at a reasonable lost has become the concern of information providers, service agencies and end users. When many research focus on how to design an efficient web crawler, we pay our attention to how to make the best of the result of web crawler. In this paper, we describe the design and implementation of an information mining system running on the results of web crawler to gain more metadata from unstructured documents for focused search (such as RSS search). We present the software architecture of the system, describe efficient techniques for achieving high performance and report preliminary experimental results to prove that this system can address the issue of robustness, flexibility and accuracy at a low cost.
引用
收藏
页码:100 / 104
页数:5
相关论文
共 50 条
  • [31] System for information categorization based on Web mining
    [J]. Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2003, 31 (07):
  • [32] Private information protection System with Web-Crawler
    Choi, Myung Sil
    Kim, Hyun Woo
    Kim, Yong Hwan
    Chung, Kyung Ho
    Ahn, Kwang Seon
    [J]. 2008 4TH IEEE INTERNATIONAL CONFERENCE ON WIRELESS AND MOBILE COMPUTING, NETWORKING AND COMMUNICATIONS (WIMOB), 2008, : 672 - 677
  • [33] Design and implementation of a high-performance distributed web crawler
    Shkapenyuk, V
    Suel, T
    [J]. 18TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2002, : 357 - 368
  • [34] The Design and Implementation of a High-efficiency Distributed Web Crawler
    Pu, Qiumei
    [J]. 2016 IEEE 14TH INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, 14TH INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, 2ND INTL CONF ON BIG DATA INTELLIGENCE AND COMPUTING AND CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/DATACOM/CYBERSC, 2016, : 100 - 104
  • [35] Design and Implementation of an Automatic Scanning Tool of SQL Injection Vulnerability Based on Web Crawler
    Lei, Xiaochun
    Qu, Jiashi
    Yao, Gang
    Chen, Junyan
    Shen, Xin
    [J]. SECURITY WITH INTELLIGENT COMPUTING AND BIG-DATA SERVICES, 2020, 895 : 481 - 488
  • [37] Design and Implementation of Science and Technology Information Collection System Based on Web Block
    Wei, Shuming
    [J]. PROCEEDINGS OF THE 2017 2ND INTERNATIONAL CONFERENCE ON ADVANCES IN MATERIALS, MECHATRONICS AND CIVIL ENGINEERING (ICAMMCE 2017), 2017, 121 : 159 - 161
  • [38] Design and Implementation of WebGIS & Web Service-Based Logistic Information System
    Zhang, Pei
    Wang, Feng
    Lin, Peiguang
    [J]. ISBIM: 2008 INTERNATIONAL SEMINAR ON BUSINESS AND INFORMATION MANAGEMENT, VOL 2, 2009, : 108 - +
  • [39] Design and Implementation of College Student Information Management System Based on Web Services
    Tang Yu-fang
    Zhang Yong-sheng
    [J]. 2009 IEEE INTERNATIONAL SYMPOSIUM ON IT IN MEDICINE & EDUCATION, VOLS 1 AND 2, PROCEEDINGS, 2009, : 1044 - 1048
  • [40] Efficient watcher based web crawler design
    Alqaraleh, Saed
    Ramadan, Omar
    Salamah, Muhammed
    [J]. ASLIB JOURNAL OF INFORMATION MANAGEMENT, 2015, 67 (06) : 663 - 686