KEYWORD EXTRACTION OF WEB PAGES BASED ON DOMAIN THESAURUS

被引：0

作者：

He, Guowan ^{[1
]}

Wang, Jie ^{[1
]}

Zhang, Yafeng ^{[1
]}

Peng, Yan ^{[1
]}

机构：

[1] Capital Normal Univ, Sch Management, Beijing 100089, Peoples R China

来源：

2014 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS) | 2014年

基金：

北京市自然科学基金;

关键词：

Keyword extraction; Domain thesaurus; Keyword of web pages; Keyword weight;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a keyword extraction method of web pages based on domain thesaurus. The method extracts keywords from web pages based on traditional statistic features, such as frequency and location, and it also evaluates the weight of candidate keywords combining with their relation of domain thesaurus. This method can effectively identify domain keywords of web pages with low frequency but more information in specific area. Based on the web pages keywords extraction of environment domain as an example, this paper introduces the framework and algorithm of the method. Experimental results show that, compared with the traditional TF-IDF method, this method has a better keyword extraction performance in environment-related web pages, an average of 20% recall rate, and an average of 15 percent accuracy rate.

引用

页码：310 / 314

页数：5

共 50 条

[1] Keyword extraction based on lexical chains for Chinese news web pages
Hu, Xue-Gang
Li, Xing-Hua
Xie, Fei
Wu, Xin-Dong
[J]. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2010, 23 (01): : 45 - 51
[2] Keyword Extraction Based on Multi-feature Fusion for Chinese Web Pages
He, Qi
Hao, Hong-Wei
Yin, Xu-Cheng
[J]. PROCEEDINGS OF THE 2011 2ND INTERNATIONAL CONGRESS ON COMPUTER APPLICATIONS AND COMPUTATIONAL SCIENCE, VOL 1, 2012, 144 : 119 - 124
[3] An effective keyword extraction method for videos in Web pages by analyzing their layout structures
Lee, Jongwon
Choi, Giseok
Jang, Juyeon
Nang, Jongho
[J]. TENCON 2007 - 2007 IEEE REGION 10 CONFERENCE, VOLS 1-3, 2007, : 92 - +
[4] A keyword extraction based model for web advertisement
Zhou, Ning
Wu, Jiaxin
Zhang, Shaolong
[J]. INTEGRATION AND INNOVATION ORIENT TO E-SOCIETY, VOL 2, 2007, 252 : 168 - +
[5] A keyword extraction based model for web advertisement
Research Center of Information Resources, Wuhan University, Wuhan
430072, China
不详
430072, China
[J]. IFIP Advances in Information and Communication Technology, 2007, (168-175)
[6] CIRank: A Method for Keyword Extraction from Web pages using clustering and distribution of nouns
Rezaei, Mohammad
Gali, Najlah
Franti, Pasi
[J]. 2015 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT), VOL 1, 2015, : 79 - 84
[7] Study of Extraction for Web Pages Information Based on XML
Li, Suming
[J]. PROCEEDINGS OF THE 2016 2ND WORKSHOP ON ADVANCED RESEARCH AND TECHNOLOGY IN INDUSTRY APPLICATIONS, 2016, 81 : 829 - 832
[8] Information extraction from Web pages using presentation regularities and domain knowledge
Vadrevu, Srinivas
Gelgi, Fatih
Davulcu, Hasan
[J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2007, 10 (02): : 157 - 179
[9] Information Extraction from Web Pages Using Presentation Regularities and Domain Knowledge
Srinivas Vadrevu
Fatih Gelgi
Hasan Davulcu
[J]. World Wide Web, 2007, 10 : 157 - 179
[10] Corpus Based Extraction of Hypernyms in Terminological Thesaurus for Land Surveying Domain
Baisa, Vit
Suchomel, Vit
[J]. RECENT ADVANCES IN SLAVONIC NATURAL LANGUAGE PROCESSING (RASLAN 2015), 2015, : 69 - 74

← 1 2 3 4 5 →