A Focused Crawler Based on Correlation Analysis

被引:0
|
作者
Qin, Qiuli [1 ]
Peng, Xin [1 ]
机构
[1] Beijing Jiaotong Univ, Sch Econ & Management, Logist Technol & Management Lab, Beijing 100044, Peoples R China
关键词
Focused Crawler; web crawler; VSM; TF-IDF;
D O I
10.14257/ijfgcn.2014.7.6.02
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
With the rapid development of network and information technology, there is a wealth of huge amounts of data on the internet. But it's a major problem faced by the majority of researchers how to effectively filter out a particular subject or field of information from these data. In this paper, we try to builder a focused crawler based on vector space model and TF-IDF text correlation analysis. We take the seed URL as a collection entrance and fetch web pages from internet. Then analysis page information though technological like web content extraction, page link analysis technology and get the main content of one page. By the correlation analysis method based on VSM and TF-IDF text, we calculation the correlation between pages and the topics what have been defined, so we can achieve the purpose of the focus areas of the web.
引用
收藏
页码:13 / 20
页数:8
相关论文
共 50 条
  • [1] Adaptive focused crawler based on tunneling and link analysis
    Zhang, Xiaoming
    Li, Zhoujun
    Hu, Chaojian
    11TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY, VOLS I-III, PROCEEDINGS,: UBIQUITOUS ICT CONVERGENCE MAKES LIFE BETTER!, 2009, : 2225 - 2230
  • [2] Ontology based learnable focused crawler
    Software School, Xiamen Univ., Xiamen 361005, China
    不详
    J. Comput. Inf. Syst., 2007, 3 (1173-1180):
  • [3] An ontology-based focused crawler
    Kozanidis, Lefteris
    NATURAL LANGUAGE AND INFORMATION SYSTEMS, PROCEEDINGS, 2008, 5039 : 376 - 379
  • [4] Ontology-based focused crawler
    Lu, Gechao
    Zuo, Wanli
    Zhang, Aiqi
    Wang, Ying
    Ji, Wenyan
    Journal of Information and Computational Science, 2010, 7 (02): : 577 - 584
  • [5] A Focused Linked Data Crawler based on HTML']HTML Link Analysis
    Emamdadi, Reihaneh
    Kahani, Mohsen
    Zarrinkalam, Fattane
    2014 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2014, : 74 - 79
  • [6] HAWK: A Focused Crawler with Content and Link Analysis
    Chen, Xiaoyun
    Zhang, Xin
    PROCEEDINGS OF THE ICEBE 2008: IEEE INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING, 2008, : 677 - 680
  • [7] A Novel Focused Crawler Based on Breadcrumb Navigation
    Ying, Lizhi
    Zhou, Xinhao
    Yuan, Jian
    Huang, Yongfeng
    ADVANCES IN SWARM INTELLIGENCE, ICSI 2012, PT II, 2012, 7332 : 264 - 271
  • [8] Focused image crawler based on mobile agent
    Lin Kunhui
    Zhang Lei
    Zhou Changle
    Ni Ziwei
    Wu Qingfeng
    Advanced Computer Technology, New Education, Proceedings, 2007, : 808 - 811
  • [9] A Focused Crawler Based on Naive Bayes Classifier
    Wang, Wenxian
    Chen, Xingshu
    Zou, Yongbin
    Wang, Haizhou
    Dai, Zongkun
    2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010), 2010, : 517 - 521
  • [10] An intelligent focused crawler based on genetic algorithm
    Yu, Chun
    Du, Yajun
    Liu, Wenjun
    Journal of Computational Information Systems, 2014, 10 (18): : 8059 - 8066