MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction

被引:0
|
作者
Zhang, Linhan [1 ,2 ]
Chen, Qian [2 ]
Wang, Wen [2 ]
Deng, Chong [2 ]
Zhang, Shiliang [2 ]
Li, Bing [3 ]
Wang, Wei [4 ]
Cao, Xin [1 ]
机构
[1] Univ New South Wales, Sch Comp Sci & Engn, Sydney, NSW, Australia
[2] Alibaba Grp, Speech Lab, Hangzhou, Peoples R China
[3] A STAR Ctr Frontier AI Res CFAR, Singapore, Singapore
[4] Hong Kong Univ Sci & Technol Guangzhou, Guangzhou, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Keyphrase extraction (KPE) automatically extracts phrases in a document that provide a concise summary of the core content, which benefits downstream information retrieval and NLP tasks. Previous state-of-the-art (SOTA) methods select candidate keyphrases based on the similarity between learned representations of the candidates and the document. They suffer performance degradation on long documents due to discrepancy between sequence lengths which causes mismatch between representations of keyphrase candidates and the document. In this work, we propose a novel unsupervised embedding-based KPE approach, Masked Document Embedding Rank (MDERank), to address this problem by leveraging a mask strategy and ranking candidates by the similarity between embeddings of the source document and the masked document. We further develop a KPE-oriented BERT (KPEBERT) model by proposing a novel self-supervised contrastive learning method, which is more compatible to MDERank than vanilla BERT. Comprehensive evaluations on six KPE benchmarks demonstrate that the proposed MDERank outperforms state-of-the-art unsupervised KPE approach by average 1.80 F1@15 improvement. MDERank further benefits from KPEBERT and overall achieves average 3.53 F1@15 improvement over the SOTA SIFRank. Our code is available at https://github.com/LinhanZ/mderank.
引用
收藏
页码:396 / 409
页数:14
相关论文
共 50 条
  • [21] RAKE-PMI AUTOMATED KEYPHRASE EXTRACTION An unsupervised approach for automated extraction of keyphrases
    Gupta, Somya
    Mittal, Namita
    Kumar, Alok
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATICS AND ANALYTICS (ICIA' 16), 2016,
  • [22] Automatic tag recommendation approach with keyphrase extraction and word embedding techniques
    Konkaew, Taechawat
    Kitisin, Sukumal
    Journal of Computers (Taiwan), 2019, 30 (02) : 135 - 149
  • [23] RAKE-PMI Automated keyphrase extraction: An unsupervised approach for automated extraction of keyphrases
    2016, Association for Computing Machinery, 2 Penn Plaza, Suite 701, New York, NY 10121-0701, United States (25-26-August-2016):
  • [24] HCUKE: A Hierarchical Context-aware approach for Unsupervised Keyphrase Extraction
    Xu, Chun
    Mao, Xian-Ling
    Xin, Cheng-Xin
    Shang, Yu-Ming
    Che, Tian-Yi
    Mao, Hong-Li
    Huang, Heyan
    KNOWLEDGE-BASED SYSTEMS, 2024, 304
  • [25] CorePhrase: Keyphrase extraction for document clustering
    Hammouda, KM
    Matute, DN
    Kamel, MS
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, PROCEEDINGS, 2005, 3587 : 265 - 274
  • [26] Document-level Keyphrase Extraction Approach using Neighborhood Knowledge
    Li C.-L.
    Long J.-H.
    Tang Z.-L.
    Zhou T.
    Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2021, 50 (04): : 551 - 557
  • [27] PromptRank: Unsupervised Keyphrase Extraction Using Prompt
    Kong, Aobo
    Zhao, Shiwan
    Chen, Hao
    Li, Qicheng
    Qin, Yong
    Sun, Ruiqi
    Bai, Xiaoyan
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 9788 - 9801
  • [28] Unsupervised feature extraction by low-rank and sparsity preserving embedding
    Zhan, Shanhua
    Wu, Jigang
    Han, Na
    Wen, Jie
    Fang, Xiaozhao
    NEURAL NETWORKS, 2019, 109 : 56 - 66
  • [29] How Preprocessing Affects Unsupervised Keyphrase Extraction
    Wang, Rui
    Liu, Wei
    McDonald, Chris
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2014, PT I, 2014, 8403 : 163 - 176
  • [30] NamedKeys: Unsupervised Keyphrase Extraction for Biomedical Documents
    Gero, Zelalem
    Ho, Joyce C.
    ACM-BCB'19: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, 2019, : 328 - 337