KEYWORD EXTRACTION OF WEB PAGES BASED ON DOMAIN THESAURUS

被引:0
|
作者
He, Guowan [1 ]
Wang, Jie [1 ]
Zhang, Yafeng [1 ]
Peng, Yan [1 ]
机构
[1] Capital Normal Univ, Sch Management, Beijing 100089, Peoples R China
基金
北京市自然科学基金;
关键词
Keyword extraction; Domain thesaurus; Keyword of web pages; Keyword weight;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a keyword extraction method of web pages based on domain thesaurus. The method extracts keywords from web pages based on traditional statistic features, such as frequency and location, and it also evaluates the weight of candidate keywords combining with their relation of domain thesaurus. This method can effectively identify domain keywords of web pages with low frequency but more information in specific area. Based on the web pages keywords extraction of environment domain as an example, this paper introduces the framework and algorithm of the method. Experimental results show that, compared with the traditional TF-IDF method, this method has a better keyword extraction performance in environment-related web pages, an average of 20% recall rate, and an average of 15 percent accuracy rate.
引用
收藏
页码:310 / 314
页数:5
相关论文
共 50 条
  • [31] Extraction of Informative Blocks from Web Pages
    Cao, YuJuan
    Niu, ZhenDong
    Dai, LiuLing
    Zhao, YuMing
    [J]. ALPIT 2008: SEVENTH INTERNATIONAL CONFERENCE ON ADVANCED LANGUAGE PROCESSING AND WEB INFORMATION TECHNOLOGY, PROCEEDINGS, 2008, : 544 - 549
  • [32] Extraction of hidden semantics from web pages
    Carchiolo, V
    Longheu, A
    Malgeri, M
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2002, 2002, 2412 : 117 - 122
  • [33] Advertising Keywords Extraction from Web Pages
    Liu, Jianyi
    Wang, Cong
    Liu, Zhengyang
    Yao, Wenbin
    [J]. WEB INFORMATION SYSTEMS AND MINING, 2010, 6318 : 336 - 343
  • [34] Data extraction and annotation for dynamic web pages
    Song, H
    Giri, S
    Ma, FY
    [J]. 2004 IEEE INTERNATIONAL CONFERNECE ON E-TECHNOLOGY, E-COMMERE AND E-SERVICE, PROCEEDINGS, 2004, : 499 - 502
  • [35] Isotopes Information Center Keyword Thesaurus
    Wright, Keith
    Hines, Theodore C.
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1970, 21 (04): : 305 - 305
  • [36] Keyphrase extraction from Chinese news web pages based on semantic relations
    Xie, Fei
    Wu, Xindong
    Hu, Xue-Gang
    Wang, Fei-Yue
    [J]. INTELLIGENCE AND SECURITY INFORMATICS, PROCEEDINGS, 2008, 5075 : 490 - +
  • [37] Ontology-Based Information Extraction of Crop Diseases on Chinese Web Pages
    Jiang, Bo
    Zhu, Meng-xia
    Wang, Jia-le
    [J]. JOURNAL OF COMPUTERS, 2013, 8 (01) : 85 - 90
  • [38] Content Extraction from Web Pages Based on the Row Block Semantics and Punctuations
    Song, Anping
    Ding, Xuehai
    Li, Mingbo
    Si, Wulin
    Zhang, Wu
    [J]. PROCEEDINGS OF THE 2013 ASIA-PACIFIC COMPUTATIONAL INTELLIGENCE AND INFORMATION TECHNOLOGY CONFERENCE, 2013, : 327 - 334
  • [39] An information extraction method based on improved mixed text density web pages
    Zhou, Yuan
    Yin, Xiaojun
    Yan, Jingchen
    [J]. EXPERT SYSTEMS, 2024, 41 (06)
  • [40] Keyphrase extraction from Chinese news web pages based on semantic relations
    Xie, Fei
    Wu, Xindong
    Hu, Xue-Gang
    Wang, Fei-Yue
    [J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2008, 5075 : 490 - 495