MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction

被引:0
|
作者
Zhang, Linhan [1 ,2 ]
Chen, Qian [2 ]
Wang, Wen [2 ]
Deng, Chong [2 ]
Zhang, Shiliang [2 ]
Li, Bing [3 ]
Wang, Wei [4 ]
Cao, Xin [1 ]
机构
[1] Univ New South Wales, Sch Comp Sci & Engn, Sydney, NSW, Australia
[2] Alibaba Grp, Speech Lab, Hangzhou, Peoples R China
[3] A STAR Ctr Frontier AI Res CFAR, Singapore, Singapore
[4] Hong Kong Univ Sci & Technol Guangzhou, Guangzhou, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Keyphrase extraction (KPE) automatically extracts phrases in a document that provide a concise summary of the core content, which benefits downstream information retrieval and NLP tasks. Previous state-of-the-art (SOTA) methods select candidate keyphrases based on the similarity between learned representations of the candidates and the document. They suffer performance degradation on long documents due to discrepancy between sequence lengths which causes mismatch between representations of keyphrase candidates and the document. In this work, we propose a novel unsupervised embedding-based KPE approach, Masked Document Embedding Rank (MDERank), to address this problem by leveraging a mask strategy and ranking candidates by the similarity between embeddings of the source document and the masked document. We further develop a KPE-oriented BERT (KPEBERT) model by proposing a novel self-supervised contrastive learning method, which is more compatible to MDERank than vanilla BERT. Comprehensive evaluations on six KPE benchmarks demonstrate that the proposed MDERank outperforms state-of-the-art unsupervised KPE approach by average 1.80 F1@15 improvement. MDERank further benefits from KPEBERT and overall achieves average 3.53 F1@15 improvement over the SOTA SIFRank. Our code is available at https://github.com/LinhanZ/mderank.
引用
收藏
页码:396 / 409
页数:14
相关论文
共 50 条
  • [1] C-Rank: A Concept Linking Approach to Unsupervised Keyphrase Extraction
    Lucca Tosi, Mauro Dalle
    dos Reis, Julio Cesar
    METADATA AND SEMANTIC RESEARCH, MTSR 2019, 2019, 1057 : 236 - 247
  • [2] Keyphrase Extraction Using Enhanced Word and Document Embedding
    Alotaibi, Fahd Saleh
    Sharma, Saurabh
    Gupta, Vishal
    Gupta, Savita
    IETE JOURNAL OF RESEARCH, 2023, 69 (12) : 8876 - 8888
  • [3] Towards unsupervised keyphrase extraction via an autoregressive approach
    Li, Tuohang
    Hu, Liang
    Li, Hongtu
    Sun, Chengyu
    Li, Shuai
    Chi, Ling
    KNOWLEDGE-BASED SYSTEMS, 2023, 274
  • [4] TOP-Rank: A Novel Unsupervised Approach for Topic Prediction Using Keyphrase Extraction for Urdu Documents
    Amin, Ahmad
    Rana, Toqir A.
    Mian, Natash Ali
    Iqbal, Muhammad Waseem
    Khalid, Abbas
    Alyas, Tahir
    Tubishat, Mohammad
    IEEE ACCESS, 2020, 8 (08): : 212675 - 212686
  • [5] KP-Rank: a semantic-based unsupervised approach for keyphrase extraction from text data
    Muhammad Aman
    Said Jadid Abdulkadir
    Izzatdin Abdul Aziz
    Hitham Alhussian
    Israr Ullah
    Multimedia Tools and Applications, 2021, 80 : 12469 - 12506
  • [6] KP-Rank: a semantic-based unsupervised approach for keyphrase extraction from text data
    Aman, Muhammad
    Abdulkadir, Said Jadid
    Aziz, Izzatdin Abdul
    Alhussian, Hitham
    Ullah, Israr
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (08) : 12469 - 12506
  • [7] Single-Document Keyphrase Extraction for Multi-Document Keyphrase Extraction
    Berend, Gabor
    Farkas, Richard
    COMPUTACION Y SISTEMAS, 2013, 17 (02): : 179 - 186
  • [8] Improving Embedding-based Unsupervised Keyphrase Extraction by Incorporating Structural Information
    Song, Mingyang
    Liu, Huafeng
    Feng, Yi
    Jing, Liping
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 1041 - 1048
  • [9] HAKE: an Unsupervised Approach to Automatic Keyphrase Extraction for Multiple Domains
    Merrouni, Zakariae Alami
    Frikh, Bouchra
    Ouhbi, Brahim
    COGNITIVE COMPUTATION, 2022, 14 (02) : 852 - 874
  • [10] PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents
    Florescu, Corina
    Caragea, Cornelia
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1105 - 1115