Automatic keyphrase extraction using word embeddings

被引:66
|
作者
Zhang, Yuxiang [1 ]
Liu, Huan [1 ]
Wang, Suge [2 ]
Ip, W. H. [3 ,4 ]
Fan, Wei [1 ]
Xiao, Chunjing [1 ]
机构
[1] Civil Aviat Univ China, Sch Comp Sci & Technol, Tianjin, Peoples R China
[2] Shanxi Univ, Sch Comp & Informat Technol, Taiyuan, Peoples R China
[3] Hong Kong Polytech Univ, Dept Ind & Syst Engn, Kowloon, Hong Kong, Peoples R China
[4] Univ Saskatchewan, Dept Mech Engn, Saskatoon, SK, Canada
基金
中国国家自然科学基金;
关键词
Keyphrase extraction; Random-walk-based keyphrase extraction model; Word embedding; Phrase scoring model; EFFICIENT;
D O I
10.1007/s00500-019-03963-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Unsupervised random-walk keyphrase extraction models mainly rely on global structural information of the word graph, with nodes representing candidate words and edges capturing the co-occurrence information between candidate words. However, using word embedding method to integrate multiple kinds of useful information into the random-walk model to help better extract keyphrases is relatively unexplored. In this paper, we propose a random-walk-based ranking method to extract keyphrases from text documents using word embeddings. Specifically, we first design a heterogeneous text graph embedding model to integrate local context information of the word graph (i.e., the local word collocation patterns) with some crucial features of candidate words and edges of the word graph. Then, a novel random-walk-based ranking model is designed to score candidate words by leveraging such learned word embeddings. Finally, a new and generic similarity-based phrase scoring model using word embeddings is proposed to score phrases for selecting top-scoring phrases as keyphrases. Experimental results show that the proposed method consistently outperforms eight state-of-the-art unsupervised methods on three real datasets for keyphrase extraction.
引用
收藏
页码:5593 / 5608
页数:16
相关论文
共 50 条
  • [31] Automatic Keyphrase Extraction from Medical Documents
    Sarkar, Kamal
    [J]. PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2009, 5909 : 273 - 278
  • [32] Automatic keyphrase extraction from Chinese books
    Chen, Yijiang
    Shi, Xiaodong
    Zhou, Changle
    Su, Chang
    [J]. SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 3, PROCEEDINGS, 2007, : 92 - +
  • [33] Automatic keyphrase extraction from scientific articles
    Kim, Su Nam
    Medelyan, Olena
    Kan, Min-Yen
    Baldwin, Timothy
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2013, 47 (03) : 723 - 742
  • [34] Automatic Keyphrase Extraction: A Survey of the State of the Art
    Hasan, Kazi Saidul
    Ng, Vincent
    [J]. PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2014, : 1262 - 1273
  • [35] A SUPERVISED LEARNING APPROACH FOR AUTOMATIC KEYPHRASE EXTRACTION
    Abulaish, Muhammad
    Anwar, Tarique
    [J]. INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2012, 8 (11): : 7579 - 7601
  • [36] Automatic Keyphrase Extraction and Segmentation of Video Lectures
    Balagopalan, Arun
    Balasubramanian, Lalitha Lakshmi
    Balasubramanian, Vidhya
    Chandrasekharan, Nithin
    Damodar, Aswin
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON TECHNOLOGY ENHANCED EDUCATION (ICTEE 2012), 2012,
  • [37] Automatic Keyphrase Extraction : An Overview Of The State Of The Art
    Merrouni, Zakariae Alami
    Frikh, Bouchra
    Ouhbi, Brahim
    [J]. 2016 4TH IEEE INTERNATIONAL COLLOQUIUM ON INFORMATION SCIENCE AND TECHNOLOGY (CIST), 2016, : 306 - 313
  • [38] Automatic Malware Clustering using Word Embeddings and Unsupervised Learning
    Leonardo Duarte-Garcia, Hugo
    Cortez-Marquez, Alberto
    Sanchez-Perez, Gabriel
    Perez-Meana, Hector
    Toscano-Medina, Karina
    Hernandez-Suarez, Aldo
    [J]. 2019 7TH INTERNATIONAL WORKSHOP ON BIOMETRICS AND FORENSICS (IWBF), 2019,
  • [39] Word segmentation and POS tagging for Chinese keyphrase extraction
    Huang, XC
    Chen, J
    Yan, PL
    Luo, X
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 364 - 369
  • [40] Automatic Idiom Recognition with Word Embeddings
    Peng, Jing
    Feldman, Anna
    [J]. INFORMATION MANAGEMENT AND BIG DATA, 2017, 656 : 17 - 29