H-Rank: A keywords extraction method from web pages using POS tags

被引:0
|
作者
Shah, Himat [1 ]
Khan, Muhammad U. S. [1 ]
Franti, Pasi [1 ]
机构
[1] Univ Eastern Finland, Sch Comp, Joensuu, Finland
关键词
Agglomerative clustering; POS tags; Web pages;
D O I
10.1109/indin41052.2019.8972331
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We present a new keywords extraction method that applies the semantic similarity among the frequent words on the web page along with the distribution of POS tags. We apply hierarchical clustering to cluster the semantically similar words that have more coverage of the content of the web page. Our method shows better performance than CL-Rank and other existing methodologies.
引用
收藏
页码:264 / 269
页数:6
相关论文
共 50 条
  • [11] Extraction of hidden semantics from web pages
    Carchiolo, V
    Longheu, A
    Malgeri, M
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2002, 2002, 2412 : 117 - 122
  • [12] A method for indexing web pages using web bots
    Szymanski, BK
    Chung, MS
    2001 INTERNATIONAL CONFERENCES ON INFO-TECH AND INFO-NET PROCEEDINGS, CONFERENCE A-G: INFO-TECH & INFO-NET: A KEY TO BETTER LIFE, 2001, : C1 - C6
  • [13] Semiautomatic extraction of topic maps from Web pages using clustering with web contents and structure
    Mase, Motohiro
    Yamada, Seiji
    Nitta, Katsumi
    PROCEEDING OF THE 2007 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WORKSHOPS, 2007, : 208 - +
  • [14] Information extraction from Web pages using presentation regularities and domain knowledge
    Vadrevu, Srinivas
    Gelgi, Fatih
    Davulcu, Hasan
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2007, 10 (02): : 157 - 179
  • [15] Information Extraction from Web Pages Using Presentation Regularities and Domain Knowledge
    Srinivas Vadrevu
    Fatih Gelgi
    Hasan Davulcu
    World Wide Web, 2007, 10 : 157 - 179
  • [16] The hw-rank: an h-index variant for ranking web pages
    Bar-Ilan, Judit
    Levene, Mark
    SCIENTOMETRICS, 2015, 102 (03) : 2247 - 2253
  • [17] The hw-rank: an h-index variant for ranking web pages
    Judit Bar-Ilan
    Mark Levene
    Scientometrics, 2015, 102 : 2247 - 2253
  • [18] Turkish Keyphrase Extraction from Web Pages with BERT
    Ayan, Emre Tolga
    Arslan, Rabia
    Zengin, Muhammed Said
    Duru, Haci Ali
    Salman, Sedat
    Bardak, Batuhan
    29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [19] Structrued and semantic data extraction from Web pages
    Gan, Y
    Zhang, SZ
    PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 2930 - 2935
  • [20] A Novel Approach for Content Extraction from Web Pages
    Bhardwaj, Aanshi
    Mangat, Veenu
    2014 RECENT ADVANCES IN ENGINEERING AND COMPUTATIONAL SCIENCES (RAECS), 2014,