KEYWORD EXTRACTION OF WEB PAGES BASED ON DOMAIN THESAURUS

被引:0
|
作者
He, Guowan [1 ]
Wang, Jie [1 ]
Zhang, Yafeng [1 ]
Peng, Yan [1 ]
机构
[1] Capital Normal Univ, Sch Management, Beijing 100089, Peoples R China
基金
北京市自然科学基金;
关键词
Keyword extraction; Domain thesaurus; Keyword of web pages; Keyword weight;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a keyword extraction method of web pages based on domain thesaurus. The method extracts keywords from web pages based on traditional statistic features, such as frequency and location, and it also evaluates the weight of candidate keywords combining with their relation of domain thesaurus. This method can effectively identify domain keywords of web pages with low frequency but more information in specific area. Based on the web pages keywords extraction of environment domain as an example, this paper introduces the framework and algorithm of the method. Experimental results show that, compared with the traditional TF-IDF method, this method has a better keyword extraction performance in environment-related web pages, an average of 20% recall rate, and an average of 15 percent accuracy rate.
引用
收藏
页码:310 / 314
页数:5
相关论文
共 50 条
  • [1] Keyword extraction based on lexical chains for Chinese news web pages
    Hu, Xue-Gang
    Li, Xing-Hua
    Xie, Fei
    Wu, Xin-Dong
    [J]. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2010, 23 (01): : 45 - 51
  • [2] Keyword Extraction Based on Multi-feature Fusion for Chinese Web Pages
    He, Qi
    Hao, Hong-Wei
    Yin, Xu-Cheng
    [J]. PROCEEDINGS OF THE 2011 2ND INTERNATIONAL CONGRESS ON COMPUTER APPLICATIONS AND COMPUTATIONAL SCIENCE, VOL 1, 2012, 144 : 119 - 124
  • [3] An effective keyword extraction method for videos in Web pages by analyzing their layout structures
    Lee, Jongwon
    Choi, Giseok
    Jang, Juyeon
    Nang, Jongho
    [J]. TENCON 2007 - 2007 IEEE REGION 10 CONFERENCE, VOLS 1-3, 2007, : 92 - +
  • [4] A keyword extraction based model for web advertisement
    Zhou, Ning
    Wu, Jiaxin
    Zhang, Shaolong
    [J]. INTEGRATION AND INNOVATION ORIENT TO E-SOCIETY, VOL 2, 2007, 252 : 168 - +
  • [5] A keyword extraction based model for web advertisement
    Research Center of Information Resources, Wuhan University, Wuhan
    430072, China
    不详
    430072, China
    [J]. IFIP Advances in Information and Communication Technology, 2007, (168-175)
  • [6] CIRank: A Method for Keyword Extraction from Web pages using clustering and distribution of nouns
    Rezaei, Mohammad
    Gali, Najlah
    Franti, Pasi
    [J]. 2015 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT), VOL 1, 2015, : 79 - 84
  • [7] Study of Extraction for Web Pages Information Based on XML
    Li, Suming
    [J]. PROCEEDINGS OF THE 2016 2ND WORKSHOP ON ADVANCED RESEARCH AND TECHNOLOGY IN INDUSTRY APPLICATIONS, 2016, 81 : 829 - 832
  • [8] Information extraction from Web pages using presentation regularities and domain knowledge
    Vadrevu, Srinivas
    Gelgi, Fatih
    Davulcu, Hasan
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2007, 10 (02): : 157 - 179
  • [9] Information Extraction from Web Pages Using Presentation Regularities and Domain Knowledge
    Srinivas Vadrevu
    Fatih Gelgi
    Hasan Davulcu
    [J]. World Wide Web, 2007, 10 : 157 - 179
  • [10] Corpus Based Extraction of Hypernyms in Terminological Thesaurus for Land Surveying Domain
    Baisa, Vit
    Suchomel, Vit
    [J]. RECENT ADVANCES IN SLAVONIC NATURAL LANGUAGE PROCESSING (RASLAN 2015), 2015, : 69 - 74