Automatic keyphrase extraction using word embeddings

被引:66
|
作者
Zhang, Yuxiang [1 ]
Liu, Huan [1 ]
Wang, Suge [2 ]
Ip, W. H. [3 ,4 ]
Fan, Wei [1 ]
Xiao, Chunjing [1 ]
机构
[1] Civil Aviat Univ China, Sch Comp Sci & Technol, Tianjin, Peoples R China
[2] Shanxi Univ, Sch Comp & Informat Technol, Taiyuan, Peoples R China
[3] Hong Kong Polytech Univ, Dept Ind & Syst Engn, Kowloon, Hong Kong, Peoples R China
[4] Univ Saskatchewan, Dept Mech Engn, Saskatoon, SK, Canada
基金
中国国家自然科学基金;
关键词
Keyphrase extraction; Random-walk-based keyphrase extraction model; Word embedding; Phrase scoring model; EFFICIENT;
D O I
10.1007/s00500-019-03963-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Unsupervised random-walk keyphrase extraction models mainly rely on global structural information of the word graph, with nodes representing candidate words and edges capturing the co-occurrence information between candidate words. However, using word embedding method to integrate multiple kinds of useful information into the random-walk model to help better extract keyphrases is relatively unexplored. In this paper, we propose a random-walk-based ranking method to extract keyphrases from text documents using word embeddings. Specifically, we first design a heterogeneous text graph embedding model to integrate local context information of the word graph (i.e., the local word collocation patterns) with some crucial features of candidate words and edges of the word graph. Then, a novel random-walk-based ranking model is designed to score candidate words by leveraging such learned word embeddings. Finally, a new and generic similarity-based phrase scoring model using word embeddings is proposed to score phrases for selecting top-scoring phrases as keyphrases. Experimental results show that the proposed method consistently outperforms eight state-of-the-art unsupervised methods on three real datasets for keyphrase extraction.
引用
收藏
页码:5593 / 5608
页数:16
相关论文
共 50 条
  • [11] Improved Automatic Keyphrase Extraction by Using Semantic Information
    Wang, XiaoLing
    Mu, DeJun
    Fang, Jun
    [J]. INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTATION TECHNOLOGY AND AUTOMATION, VOL 1, PROCEEDINGS, 2008, : 1061 - 1065
  • [12] BibRank: Automatic Keyphrase Extraction Platform Using Metadata
    Eldallal, Abdelrhman
    Barbu, Eduard
    [J]. INFORMATION, 2023, 14 (10)
  • [13] Automatic Text Summarization using Word Embeddings
    Easwar, Arjun
    Uthra, Annie
    [J]. PROCEEDINGS OF THE 2021 FIFTH INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC 2021), 2021, : 1065 - 1079
  • [14] Geoscience keyphrase extraction algorithm using enhanced word embedding
    Qiu, Qinjun
    Xie, Zhong
    Wu, Liang
    Li, Wenjia
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2019, 125 : 157 - 169
  • [15] Prescription extraction using CRFs and word embeddings
    Tao, Carson
    Filannino, Michele
    Uzuner, Ozlem
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2017, 72 : 60 - 66
  • [16] Automatic Keyphrase Extraction using Graph-based Methods
    Mothe, Josiane
    Ramiandrisoa, Faneva
    Rasolomanana, Michael
    [J]. 33RD ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, 2018, : 728 - 730
  • [17] Automatic keyphrase extraction: a survey and trends
    Merrouni, Zakariae Alami
    Frikh, Bouchra
    Ouhbi, Brahim
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2020, 54 (02) : 391 - 424
  • [18] SemCluster: Unsupervised Automatic Keyphrase Extraction Using Affinity Propagation
    Alrehamy, Hassan H.
    Walker, Coral
    [J]. ADVANCES IN COMPUTATIONAL INTELLIGENCE SYSTEMS, 2018, 650 : 222 - 235
  • [19] Automatic Arabic Text Summarization Using Clustering and Keyphrase Extraction
    Fejer, Hamzah Noori
    Omar, Nazlia
    [J]. PROCEEDINGS OF THE 2014 6TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND MULTIMEDIA (ICIM), 2014, : 293 - 298
  • [20] Automatic Keyphrase Extraction Techniques: A Review
    Lim, Vicky Min-How
    Wong, Siew Fan
    Lim, Tong Ming
    [J]. 2013 IEEE SYMPOSIUM ON COMPUTERS AND INFORMATICS (ISCI 2013), 2013,