KEYWORD EXTRACTION OF WEB PAGES BASED ON DOMAIN THESAURUS

被引：0

作者：

He, Guowan ^{[1
]}

Wang, Jie ^{[1
]}

Zhang, Yafeng ^{[1
]}

Peng, Yan ^{[1
]}

机构：

[1] Capital Normal Univ, Sch Management, Beijing 100089, Peoples R China

来源：

2014 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS) | 2014年

基金：

北京市自然科学基金;

关键词：

Keyword extraction; Domain thesaurus; Keyword of web pages; Keyword weight;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a keyword extraction method of web pages based on domain thesaurus. The method extracts keywords from web pages based on traditional statistic features, such as frequency and location, and it also evaluates the weight of candidate keywords combining with their relation of domain thesaurus. This method can effectively identify domain keywords of web pages with low frequency but more information in specific area. Based on the web pages keywords extraction of environment domain as an example, this paper introduces the framework and algorithm of the method. Experimental results show that, compared with the traditional TF-IDF method, this method has a better keyword extraction performance in environment-related web pages, an average of 20% recall rate, and an average of 15 percent accuracy rate.

引用

页码：310 / 314

页数：5

共 50 条

[31] Extraction of Informative Blocks from Web Pages
Cao, YuJuan
Niu, ZhenDong
Dai, LiuLing
Zhao, YuMing
[J]. ALPIT 2008: SEVENTH INTERNATIONAL CONFERENCE ON ADVANCED LANGUAGE PROCESSING AND WEB INFORMATION TECHNOLOGY, PROCEEDINGS, 2008, : 544 - 549
[32] Extraction of hidden semantics from web pages
Carchiolo, V
Longheu, A
Malgeri, M
[J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2002, 2002, 2412 : 117 - 122
[33] Advertising Keywords Extraction from Web Pages
Liu, Jianyi
Wang, Cong
Liu, Zhengyang
Yao, Wenbin
[J]. WEB INFORMATION SYSTEMS AND MINING, 2010, 6318 : 336 - 343
[34] Data extraction and annotation for dynamic web pages
Song, H
Giri, S
Ma, FY
[J]. 2004 IEEE INTERNATIONAL CONFERNECE ON E-TECHNOLOGY, E-COMMERE AND E-SERVICE, PROCEEDINGS, 2004, : 499 - 502
[35] Isotopes Information Center Keyword Thesaurus
Wright, Keith
Hines, Theodore C.
[J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1970, 21 (04): : 305 - 305
[36] Keyphrase extraction from Chinese news web pages based on semantic relations
Xie, Fei
Wu, Xindong
Hu, Xue-Gang
Wang, Fei-Yue
[J]. INTELLIGENCE AND SECURITY INFORMATICS, PROCEEDINGS, 2008, 5075 : 490 - +
[37] Ontology-Based Information Extraction of Crop Diseases on Chinese Web Pages
Jiang, Bo
Zhu, Meng-xia
Wang, Jia-le
[J]. JOURNAL OF COMPUTERS, 2013, 8 (01) : 85 - 90
[38] Content Extraction from Web Pages Based on the Row Block Semantics and Punctuations
Song, Anping
Ding, Xuehai
Li, Mingbo
Si, Wulin
Zhang, Wu
[J]. PROCEEDINGS OF THE 2013 ASIA-PACIFIC COMPUTATIONAL INTELLIGENCE AND INFORMATION TECHNOLOGY CONFERENCE, 2013, : 327 - 334
[39] An information extraction method based on improved mixed text density web pages
Zhou, Yuan
Yin, Xiaojun
Yan, Jingchen
[J]. EXPERT SYSTEMS, 2024, 41 (06)
[40] Keyphrase extraction from Chinese news web pages based on semantic relations
Xie, Fei
Wu, Xindong
Hu, Xue-Gang
Wang, Fei-Yue
[J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2008, 5075 : 490 - 495

← 1 2 3 4 5 →