Automatic keyphrase extraction using word embeddings

被引:66
|
作者
Zhang, Yuxiang [1 ]
Liu, Huan [1 ]
Wang, Suge [2 ]
Ip, W. H. [3 ,4 ]
Fan, Wei [1 ]
Xiao, Chunjing [1 ]
机构
[1] Civil Aviat Univ China, Sch Comp Sci & Technol, Tianjin, Peoples R China
[2] Shanxi Univ, Sch Comp & Informat Technol, Taiyuan, Peoples R China
[3] Hong Kong Polytech Univ, Dept Ind & Syst Engn, Kowloon, Hong Kong, Peoples R China
[4] Univ Saskatchewan, Dept Mech Engn, Saskatoon, SK, Canada
基金
中国国家自然科学基金;
关键词
Keyphrase extraction; Random-walk-based keyphrase extraction model; Word embedding; Phrase scoring model; EFFICIENT;
D O I
10.1007/s00500-019-03963-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Unsupervised random-walk keyphrase extraction models mainly rely on global structural information of the word graph, with nodes representing candidate words and edges capturing the co-occurrence information between candidate words. However, using word embedding method to integrate multiple kinds of useful information into the random-walk model to help better extract keyphrases is relatively unexplored. In this paper, we propose a random-walk-based ranking method to extract keyphrases from text documents using word embeddings. Specifically, we first design a heterogeneous text graph embedding model to integrate local context information of the word graph (i.e., the local word collocation patterns) with some crucial features of candidate words and edges of the word graph. Then, a novel random-walk-based ranking model is designed to score candidate words by leveraging such learned word embeddings. Finally, a new and generic similarity-based phrase scoring model using word embeddings is proposed to score phrases for selecting top-scoring phrases as keyphrases. Experimental results show that the proposed method consistently outperforms eight state-of-the-art unsupervised methods on three real datasets for keyphrase extraction.
引用
收藏
页码:5593 / 5608
页数:16
相关论文
共 50 条
  • [1] Automatic keyphrase extraction using word embeddings
    Yuxiang Zhang
    Huan Liu
    Suge Wang
    W. H. Ip.
    Wei Fan
    Chunjing Xiao
    [J]. Soft Computing, 2020, 24 : 5593 - 5608
  • [2] Graph-based Keyphrase Extraction Using Word and Document Embeddings
    Zu, Xian
    Xie, Fei
    Liu, Xiaojian
    [J]. 11TH IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH (ICKG 2020), 2020, : 70 - 76
  • [3] Visual Summarization of Scholarly Videos Using Word Embeddings and Keyphrase Extraction
    Zhou, Hang
    Otto, Christian
    Ewerth, Ralph
    [J]. DIGITAL LIBRARIES FOR OPEN KNOWLEDGE, TPDL 2019, 2019, 11799 : 327 - 335
  • [4] Exploiting Position and Contextual Word Embeddings for Keyphrase Extraction from Scientific Papers
    Patel, Krutarth
    Caragea, Cornelia
    [J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1585 - 1591
  • [5] Keyphrase Extraction Using PageRank and Word Features
    Le, Huong T.
    Bui, Que X.
    [J]. 2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021), 2021, : 257 - 261
  • [6] Exploring Word Embeddings in CRF-based Keyphrase Extraction from Research Papers
    Patel, Krutarth
    Caragea, Cornelia
    [J]. PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON KNOWLEDGE CAPTURE (K-CAP '19), 2019, : 37 - 44
  • [7] Automatic tag recommendation approach with keyphrase extraction and word embedding techniques
    Konkaew, Taechawat
    Kitisin, Sukumal
    [J]. Journal of Computers (Taiwan), 2019, 30 (02) : 135 - 149
  • [8] Keyphrase Extraction Using Enhanced Word and Document Embedding
    Alotaibi, Fahd Saleh
    Sharma, Saurabh
    Gupta, Vishal
    Gupta, Savita
    [J]. IETE JOURNAL OF RESEARCH, 2023, 69 (12) : 8876 - 8888
  • [9] Keyphrase Extraction in Russian and English Scientific Articles Using Sentence Embeddings
    Quang Huy Nguyen
    Zaslavskiy, Mark
    [J]. PROCEEDINGS OF THE 28TH CONFERENCE OF OPEN INNOVATIONS ASSOCIATION FRUCT, 2021, : 334 - 340
  • [10] Web document clustering by using automatic keyphrase extraction
    Flan, Juhyun
    Kim, Taehwan
    Choi, Joongmin
    [J]. PROCEEDING OF THE 2007 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WORKSHOPS, 2007, : 56 - 59