MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction

被引:0
|
作者
Zhang, Linhan [1 ,2 ]
Chen, Qian [2 ]
Wang, Wen [2 ]
Deng, Chong [2 ]
Zhang, Shiliang [2 ]
Li, Bing [3 ]
Wang, Wei [4 ]
Cao, Xin [1 ]
机构
[1] Univ New South Wales, Sch Comp Sci & Engn, Sydney, NSW, Australia
[2] Alibaba Grp, Speech Lab, Hangzhou, Peoples R China
[3] A STAR Ctr Frontier AI Res CFAR, Singapore, Singapore
[4] Hong Kong Univ Sci & Technol Guangzhou, Guangzhou, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Keyphrase extraction (KPE) automatically extracts phrases in a document that provide a concise summary of the core content, which benefits downstream information retrieval and NLP tasks. Previous state-of-the-art (SOTA) methods select candidate keyphrases based on the similarity between learned representations of the candidates and the document. They suffer performance degradation on long documents due to discrepancy between sequence lengths which causes mismatch between representations of keyphrase candidates and the document. In this work, we propose a novel unsupervised embedding-based KPE approach, Masked Document Embedding Rank (MDERank), to address this problem by leveraging a mask strategy and ranking candidates by the similarity between embeddings of the source document and the masked document. We further develop a KPE-oriented BERT (KPEBERT) model by proposing a novel self-supervised contrastive learning method, which is more compatible to MDERank than vanilla BERT. Comprehensive evaluations on six KPE benchmarks demonstrate that the proposed MDERank outperforms state-of-the-art unsupervised KPE approach by average 1.80 F1@15 improvement. MDERank further benefits from KPEBERT and overall achieves average 3.53 F1@15 improvement over the SOTA SIFRank. Our code is available at https://github.com/LinhanZ/mderank.
引用
收藏
页码:396 / 409
页数:14
相关论文
共 50 条
  • [41] Unsupervised Keyphrase Extraction in Academic Publications Using Human Attention
    Zhang, Yingyi
    Zhang, Chengzhi
    17TH INTERNATIONAL CONFERENCE ON SCIENTOMETRICS & INFORMETRICS (ISSI2019), VOL II, 2019, : 2483 - 2484
  • [42] AttentionRank: Unsupervised keyphrase Extraction using Self and Cross Attentions
    Ding, Haoran
    Luo, Xiao
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 1919 - 1928
  • [43] TeKET: a Tree-Based Unsupervised Keyphrase Extraction Technique
    Rabby, Gollam
    Azad, Saiful
    Mahmud, Mufti
    Zamli, Kamal Z.
    Rahman, Mohammed Mostafizur
    COGNITIVE COMPUTATION, 2020, 12 (04) : 811 - 833
  • [44] Improving Diversity in Unsupervised Keyphrase Extraction with Determinantal Point Process
    Song, Mingyang
    Liu, Huafeng
    Jing, Liping
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 4294 - 4299
  • [45] SemCluster: Unsupervised Automatic Keyphrase Extraction Using Affinity Propagation
    Alrehamy, Hassan H.
    Walker, Coral
    ADVANCES IN COMPUTATIONAL INTELLIGENCE SYSTEMS, 2018, 650 : 222 - 235
  • [46] TeKET: a Tree-Based Unsupervised Keyphrase Extraction Technique
    Gollam Rabby
    Saiful Azad
    Mufti Mahmud
    Kamal Z. Zamli
    Mohammed Mostafizur Rahman
    Cognitive Computation, 2020, 12 : 811 - 833
  • [47] Unsupervised Keyphrase Extraction by Jointly Modeling Local and Global Context
    Liang, Xinnian
    Wu, Shuangzhi
    Li, Mu
    Li, Zhoujun
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 155 - 164
  • [48] An unsupervised keyphrase extraction model by incorporating structural and semantic information
    Linkai Luo
    Longmin Zhang
    Hong Peng
    Progress in Artificial Intelligence, 2020, 9 : 77 - 83
  • [49] An unsupervised keyphrase extraction model by incorporating structural and semantic information
    Luo, Linkai
    Zhang, Longmin
    Peng, Hong
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2020, 9 (01) : 77 - 83
  • [50] Geoscience keyphrase extraction algorithm using enhanced word embedding
    Qiu, Qinjun
    Xie, Zhong
    Wu, Liang
    Li, Wenjia
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 125 : 157 - 169