CIRank: A Method for Keyword Extraction from Web pages using clustering and distribution of nouns

被引：3

作者：

Rezaei, Mohammad ^{[1
]}

Gali, Najlah ^{[1
]}

Franti, Pasi ^{[1
]}

机构：

[1] Univ Eastern Finland, Sch Comp, Joensuu, Finland

来源：

2015 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT), VOL 1 | 2015年

关键词：

web mining; keywords extraction; clustering; semantic analysis;

D O I：

10.1109/WI-IAT.2015.64

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

text analysis of a web page is more difficult than the analysis of the text of normal document due to the presence of additional information, such as HTML structure, styling codes, irrelevant text, and presence of hyperlinks. In this paper, we propose an unsupervised method to extract keywords from a web page. The method extracts unigram nouns by applying part of speech tagging on the text. It then clusters the nouns based on their semantic similarity. It selects a number of keywords from the highest scored clusters. Experimental results show that our method outperforms state-of-the-art TextRank by 13 % in precision, 6 % in recall, and 10 % in F-measure.

引用

页码：79 / 84

页数：6

共 50 条

[1] Using keyword extraction for Web site clustering
Tonella, P
Ricca, F
Pianta, E
Girardi, C
[J]. FIFTH IEEE INTERNATIONAL WORKSHOP ON WEB SITE EVOLUTION THEME: ARCHITECTURE, PROCEEDINGS, 2003, : 41 - 48
[2] An effective keyword extraction method for videos in Web pages by analyzing their layout structures
Lee, Jongwon
Choi, Giseok
Jang, Juyeon
Nang, Jongho
[J]. TENCON 2007 - 2007 IEEE REGION 10 CONFERENCE, VOLS 1-3, 2007, : 92 - +
[3] KEYWORD EXTRACTION OF WEB PAGES BASED ON DOMAIN THESAURUS
He, Guowan
Wang, Jie
Zhang, Yafeng
Peng, Yan
[J]. 2014 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2014, : 310 - 314
[4] Semiautomatic extraction of topic maps from Web pages using clustering with web contents and structure
Mase, Motohiro
Yamada, Seiji
Nitta, Katsumi
[J]. PROCEEDING OF THE 2007 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WORKSHOPS, 2007, : 208 - +
[5] A mining method for linked web pages using associated keyword space
Yaguchi, Y
Ohnishi, H
Mori, S
Naruse, K
Oka, R
Takahashi, H
[J]. INTERNATIONAL SYMPOSIUM ON APPLICATIONS AND THE INTERNET , PROCEEDINGS, 2006, : 268 - 276
[6] Data extraction from semi-structured web pages by clustering
Vuong, Le Phong Bao
Gao, Xiaoying
Zhang, Mengjie
[J]. 2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 374 - +
[7] Keyword extraction based on lexical chains for Chinese news web pages
Hu, Xue-Gang
Li, Xing-Hua
Xie, Fei
Wu, Xin-Dong
[J]. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2010, 23 (01): : 45 - 51
[8] Keyword Extraction Based on Multi-feature Fusion for Chinese Web Pages
He, Qi
Hao, Hong-Wei
Yin, Xu-Cheng
[J]. PROCEEDINGS OF THE 2011 2ND INTERNATIONAL CONGRESS ON COMPUTER APPLICATIONS AND COMPUTATIONAL SCIENCE, VOL 1, 2012, 144 : 119 - 124
[9] Automatic partitioning of web pages using clustering
Romero, R
Berger, A
[J]. MOBILE HUMAN-COMPUTER INTERACTION - MOBILEHCI 2004, PROCEEDINGS, 2004, 3160 : 388 - 393
[10] Information Extraction from Web pages
Novotny, Robert
Vojtas, Peter
Maruscak, Dusan
[J]. 2009 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 3, 2009, : 121 - +

← 1 2 3 4 5 →