CIRank: A Method for Keyword Extraction from Web pages using clustering and distribution of nouns

被引:3
|
作者
Rezaei, Mohammad [1 ]
Gali, Najlah [1 ]
Franti, Pasi [1 ]
机构
[1] Univ Eastern Finland, Sch Comp, Joensuu, Finland
关键词
web mining; keywords extraction; clustering; semantic analysis;
D O I
10.1109/WI-IAT.2015.64
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
text analysis of a web page is more difficult than the analysis of the text of normal document due to the presence of additional information, such as HTML structure, styling codes, irrelevant text, and presence of hyperlinks. In this paper, we propose an unsupervised method to extract keywords from a web page. The method extracts unigram nouns by applying part of speech tagging on the text. It then clusters the nouns based on their semantic similarity. It selects a number of keywords from the highest scored clusters. Experimental results show that our method outperforms state-of-the-art TextRank by 13 % in precision, 6 % in recall, and 10 % in F-measure.
引用
收藏
页码:79 / 84
页数:6
相关论文
共 50 条
  • [1] Using keyword extraction for Web site clustering
    Tonella, P
    Ricca, F
    Pianta, E
    Girardi, C
    [J]. FIFTH IEEE INTERNATIONAL WORKSHOP ON WEB SITE EVOLUTION THEME: ARCHITECTURE, PROCEEDINGS, 2003, : 41 - 48
  • [2] An effective keyword extraction method for videos in Web pages by analyzing their layout structures
    Lee, Jongwon
    Choi, Giseok
    Jang, Juyeon
    Nang, Jongho
    [J]. TENCON 2007 - 2007 IEEE REGION 10 CONFERENCE, VOLS 1-3, 2007, : 92 - +
  • [3] KEYWORD EXTRACTION OF WEB PAGES BASED ON DOMAIN THESAURUS
    He, Guowan
    Wang, Jie
    Zhang, Yafeng
    Peng, Yan
    [J]. 2014 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2014, : 310 - 314
  • [4] Semiautomatic extraction of topic maps from Web pages using clustering with web contents and structure
    Mase, Motohiro
    Yamada, Seiji
    Nitta, Katsumi
    [J]. PROCEEDING OF THE 2007 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WORKSHOPS, 2007, : 208 - +
  • [5] A mining method for linked web pages using associated keyword space
    Yaguchi, Y
    Ohnishi, H
    Mori, S
    Naruse, K
    Oka, R
    Takahashi, H
    [J]. INTERNATIONAL SYMPOSIUM ON APPLICATIONS AND THE INTERNET , PROCEEDINGS, 2006, : 268 - 276
  • [6] Data extraction from semi-structured web pages by clustering
    Vuong, Le Phong Bao
    Gao, Xiaoying
    Zhang, Mengjie
    [J]. 2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 374 - +
  • [7] Keyword extraction based on lexical chains for Chinese news web pages
    Hu, Xue-Gang
    Li, Xing-Hua
    Xie, Fei
    Wu, Xin-Dong
    [J]. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2010, 23 (01): : 45 - 51
  • [8] Keyword Extraction Based on Multi-feature Fusion for Chinese Web Pages
    He, Qi
    Hao, Hong-Wei
    Yin, Xu-Cheng
    [J]. PROCEEDINGS OF THE 2011 2ND INTERNATIONAL CONGRESS ON COMPUTER APPLICATIONS AND COMPUTATIONAL SCIENCE, VOL 1, 2012, 144 : 119 - 124
  • [9] Automatic partitioning of web pages using clustering
    Romero, R
    Berger, A
    [J]. MOBILE HUMAN-COMPUTER INTERACTION - MOBILEHCI 2004, PROCEEDINGS, 2004, 3160 : 388 - 393
  • [10] Information Extraction from Web pages
    Novotny, Robert
    Vojtas, Peter
    Maruscak, Dusan
    [J]. 2009 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 3, 2009, : 121 - +