An Improved Focused Crawler Based on Text Keyword Extraction

被引:0
|
作者
Zheng, Zhang [1 ]
Qian, Du [2 ]
机构
[1] Wuhan Univ Technol, Dept Informat Technol, Wuhan, Hubei, Peoples R China
[2] Wuhan Univ Technol, Affiliat Dept Informat Technol, Wuhan, Hubei, Peoples R China
关键词
focused crawler; keyword extract; TF-IDF; syntactic dependency analysis;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
For the shortcoming of the traditional focused crawler, this paper proposed an improved focused crawl method which based on syntactic dependency analysis. This method generates a words collection of the text through TF-IDF algorithm and generates a phrases collection through syntactic dependency analysis firstly. Then evaluate the collection of words and phrases to select set of keywords of the text. Next use the normal search engine to search the keywords set. Part of the search result will be used as seed links in focused crawler. Focused crawler's crawling policy is the best-first search policy, and this policy uses the similarity between keywords and link's anchor text to evaluate the priority. This paper proposed a keyword extraction method based on TF-IDF algorithm and syntactic dependency analysis, the result of this method included phrases and words, because of joining phrases, the relevance of seeds and links will be improved. In this paper, we use the method of combining link's anchor text with context to evaluate the link's priority. The experiment result shows that similarity between crawling pages vs. text of using this method is 14.3 percent higher than using artificial keywords. This method has good performance in the area of the focused crawler which uses text as input and vertical search engines and other application fields.
引用
收藏
页码:386 / 390
页数:5
相关论文
共 50 条
  • [21] Ontology based learnable focused crawler
    Software School, Xiamen Univ., Xiamen 361005, China
    不详
    J. Comput. Inf. Syst., 2007, 3 (1173-1180):
  • [22] An ontology-based focused crawler
    Kozanidis, Lefteris
    NATURAL LANGUAGE AND INFORMATION SYSTEMS, PROCEEDINGS, 2008, 5039 : 376 - 379
  • [23] A Text Feature Based Automatic Keyword Extraction Method for Single Documents
    Campos, Ricardo
    Mangaravite, Vitor
    Pasquali, Arian
    Jorge, Alipio Mario
    Nunes, Celia
    Jatowt, Adam
    ADVANCES IN INFORMATION RETRIEVAL (ECIR 2018), 2018, 10772 : 684 - 691
  • [24] Ontology-based focused crawler
    Lu, Gechao
    Zuo, Wanli
    Zhang, Aiqi
    Wang, Ying
    Ji, Wenyan
    Journal of Information and Computational Science, 2010, 7 (02): : 577 - 584
  • [25] Keyword Combination Extraction in Text Categorization Based on Ant Colony Optimization
    Yu, Zi-jun
    Wu, Wei-gang
    Xiao, Jing
    Zhang, Jun
    Huang, Rui-Zhang
    Liu, Ou
    2009 INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION, 2009, : 430 - +
  • [26] Automatic Keyword Extraction From Dialogue Text
    Sali, Yusuf
    Erden, Mustafa
    2022 30TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2022,
  • [27] Keyword extraction for social media short text
    Zhao, Dexin
    Du, Nana
    Chang, Zhi
    Li, Yukun
    2017 14TH WEB INFORMATION SYSTEMS AND APPLICATIONS CONFERENCE (WISA 2017), 2017, : 251 - 256
  • [28] An Improved Search Algorithm of Focused Crawler in Vertical Search Engine
    Zuo, Xiao-jun
    Zhang, Kai-tuo
    ASIA-PACIFIC YOUTH CONFERENCE ON COMMUNICATION TECHNOLOGY 2010 (APYCCT 2010), 2010, : 509 - +
  • [29] A Novel Focused Crawler Based on Breadcrumb Navigation
    Ying, Lizhi
    Zhou, Xinhao
    Yuan, Jian
    Huang, Yongfeng
    ADVANCES IN SWARM INTELLIGENCE, ICSI 2012, PT II, 2012, 7332 : 264 - 271
  • [30] Focused image crawler based on mobile agent
    Lin Kunhui
    Zhang Lei
    Zhou Changle
    Ni Ziwei
    Wu Qingfeng
    Advanced Computer Technology, New Education, Proceedings, 2007, : 808 - 811