HAKE: an Unsupervised Approach to Automatic Keyphrase Extraction for Multiple Domains

被引:4
|
作者
Merrouni, Zakariae Alami [1 ]
Frikh, Bouchra [1 ]
Ouhbi, Brahim [2 ]
机构
[1] Sidi Mohamed Ben Abdellah Univ, Natl Sch Appl Sci ENSA, LIASSE Lab, BP 72,Route Dimouzer, Fes, Morocco
[2] Moulay Ismail Univ UMI, Natl Higher Sch Arts & Crafts ENSAM, Math Modeling & Comp Lab LM2I, Marjane 2,BP 4024, Meknes, Morocco
关键词
Automatic keyphrase extraction; Unsupervised machine learning; Feature selection; FEATURE-SELECTION; KEYWORD EXTRACTION; SYSTEM;
D O I
10.1007/s12559-021-09979-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Keyphrases capture the main content of a free text document. The task of automatic keyphrase extraction (AKPE) plays a significant role in retrieving and summarizing valuable information from several documents with different domains. Various techniques have been proposed for this task. However, supervised AKPE requires large annotated data and depends on the tested domain. An alternative solution is to consider a new independent domain method that can be applied to several domains (such as medical, social). In this paper, we tackle keyphrase extraction from single documents with HAKE, a novel unsupervised method that takes full advantage of mining linguistic, statistical, structural, and semantic text features simultaneously to select the most relevant keyphrases in a text. HAKE achieves higher F-scores than the unsupervised state-of-the-art systems on standard datasets and is suitable for real-time processing of large amounts of Web and text data across different domains. With HAKE, we also explicitly increase coverage and diversity among the selected keyphrases by introducing a novel technique (based on a parse tree approach, part of speech tagging, and filtering) for candidate keyphrase identification and extraction. This technique allows us to generate a comprehensive and meaningful list of candidate keyphrases and reduce the candidate set's size without increasing the computational complexity. HAKE's effectiveness is compared to twelve state-of-the-art and recent unsupervised approaches, as well as to some other supervised approaches. Experimental analysis is conducted to validate the proposed method using five of the top available benchmark corpora from different domains and shows that HAKE significantly outperforms both the existing unsupervised and supervised methods. Our method does not require training on a particular set of documents, nor does it depend on external corpora, dictionaries, domain, or text size. Our experiments confirm that HAKE's candidate selection model and its ranking model are effective.
引用
收藏
页码:852 / 874
页数:23
相关论文
共 50 条
  • [1] HAKE: an Unsupervised Approach to Automatic Keyphrase Extraction for Multiple Domains
    Zakariae Alami Merrouni
    Bouchra Frikh
    Brahim Ouhbi
    [J]. Cognitive Computation, 2022, 14 : 852 - 874
  • [2] A Fuzzy Approach to Improve an Unsupervised Automatic Keyphrase Extraction Process
    Perez-Guadarrama, Yamel
    Simon-Cuevas, Alfredo
    Hojas-Mazo, Wenny
    Olivas, Jose A.
    Romero, Francisco P.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2018,
  • [3] SemCluster: Unsupervised Automatic Keyphrase Extraction Using Affinity Propagation
    Alrehamy, Hassan H.
    Walker, Coral
    [J]. ADVANCES IN COMPUTATIONAL INTELLIGENCE SYSTEMS, 2018, 650 : 222 - 235
  • [4] Towards unsupervised keyphrase extraction via an autoregressive approach
    Li, Tuohang
    Hu, Liang
    Li, Hongtu
    Sun, Chengyu
    Li, Shuai
    Chi, Ling
    [J]. KNOWLEDGE-BASED SYSTEMS, 2023, 274
  • [5] ISKE: An unsupervised automatic keyphrase extraction approach using the iterated sentences based on graph method
    Chi, Ling
    Hu, Liang
    [J]. KNOWLEDGE-BASED SYSTEMS, 2021, 223
  • [6] A SUPERVISED LEARNING APPROACH FOR AUTOMATIC KEYPHRASE EXTRACTION
    Abulaish, Muhammad
    Anwar, Tarique
    [J]. INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2012, 8 (11): : 7579 - 7601
  • [7] PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents
    Florescu, Corina
    Caragea, Cornelia
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1105 - 1115
  • [8] A Graph-based Approach of Automatic Keyphrase Extraction
    Yan Ying
    Tan Qingping
    Xie Qinzheng
    Zeng Ping
    Li Panpan
    [J]. ADVANCES IN INFORMATION AND COMMUNICATION TECHNOLOGY, 2017, 107 : 248 - 255
  • [9] Keyphrase Distance Analysis Technique from News Articles as a Feature for Keyphrase Extraction: An Unsupervised Approach
    Miah, Mohammad Badrul Alam
    Awang, Suryanti
    Rahman, Md Mustafizur
    Hosen, A. S. M. Sanwar
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (10) : 995 - 1002
  • [10] TripleRank: An unsupervised keyphrase extraction algorithm
    Li, Tuohang
    Hu, Liang
    Li, Hongtu
    Sun, Chengyu
    Li, Shuai
    Chi, Ling
    [J]. KNOWLEDGE-BASED SYSTEMS, 2021, 219 (219)