MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction

被引:0
|
作者
Zhang, Linhan [1 ,2 ]
Chen, Qian [2 ]
Wang, Wen [2 ]
Deng, Chong [2 ]
Zhang, Shiliang [2 ]
Li, Bing [3 ]
Wang, Wei [4 ]
Cao, Xin [1 ]
机构
[1] Univ New South Wales, Sch Comp Sci & Engn, Sydney, NSW, Australia
[2] Alibaba Grp, Speech Lab, Hangzhou, Peoples R China
[3] A STAR Ctr Frontier AI Res CFAR, Singapore, Singapore
[4] Hong Kong Univ Sci & Technol Guangzhou, Guangzhou, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Keyphrase extraction (KPE) automatically extracts phrases in a document that provide a concise summary of the core content, which benefits downstream information retrieval and NLP tasks. Previous state-of-the-art (SOTA) methods select candidate keyphrases based on the similarity between learned representations of the candidates and the document. They suffer performance degradation on long documents due to discrepancy between sequence lengths which causes mismatch between representations of keyphrase candidates and the document. In this work, we propose a novel unsupervised embedding-based KPE approach, Masked Document Embedding Rank (MDERank), to address this problem by leveraging a mask strategy and ranking candidates by the similarity between embeddings of the source document and the masked document. We further develop a KPE-oriented BERT (KPEBERT) model by proposing a novel self-supervised contrastive learning method, which is more compatible to MDERank than vanilla BERT. Comprehensive evaluations on six KPE benchmarks demonstrate that the proposed MDERank outperforms state-of-the-art unsupervised KPE approach by average 1.80 F1@15 improvement. MDERank further benefits from KPEBERT and overall achieves average 3.53 F1@15 improvement over the SOTA SIFRank. Our code is available at https://github.com/LinhanZ/mderank.
引用
收藏
页码:396 / 409
页数:14
相关论文
共 50 条
  • [31] The Benefit of Document Embedding in Unsupervised Document Classification
    Novotny, Jaromir
    Ircing, Pavel
    SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 470 - 478
  • [32] Prioritization of COVID-19-Related Literature via Unsupervised Keyphrase Extraction and Document Representation Learning
    Skrlj, Blaz
    Jukic, Marko
    Erzen, Nika
    Pollak, Senja
    Lavrac, Nada
    DISCOVERY SCIENCE (DS 2021), 2021, 12986 : 204 - 217
  • [33] Keyword and Keyphrase Extraction from Single Hindi Document using Statistical Approach
    Siddiqi, Sifatullah
    Sharan, Aditi
    2ND INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN) 2015, 2015, : 713 - 718
  • [34] A Ranking Approach to Keyphrase Extraction
    Jiang, Xin
    Hu, Yunhua
    Li, Hang
    PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2009, : 756 - 757
  • [35] HyperRank: Hyperbolic Ranking Model for Unsupervised Keyphrase Extraction
    Song, Mingyang
    Liu, Huafeng
    Jing, Liping
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 16070 - 16080
  • [36] A New Scheme for Scoring Phrases in Unsupervised Keyphrase Extraction
    Florescu, Corina
    Caragea, Cornelia
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2017, 2017, 10193 : 477 - 483
  • [37] Unsupervised Keyphrase Extraction via Interpretable Neural Networks
    Joshi, Rishabh
    Balachandran, Vidhisha
    Saldanha, Emily
    Glenski, Maria
    Volkova, Svitlana
    Tsvetkov, Yulia
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 1107 - 1119
  • [38] Salience Rank: Efficient Keyphrase Extraction with Topic Modeling
    Teneva, Nedelina
    Cheng, Weiwei
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 530 - 535
  • [39] Web document clustering by using automatic keyphrase extraction
    Flan, Juhyun
    Kim, Taehwan
    Choi, Joongmin
    PROCEEDING OF THE 2007 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WORKSHOPS, 2007, : 56 - 59
  • [40] ISKE: An unsupervised automatic keyphrase extraction approach using the iterated sentences based on graph method
    Chi, Ling
    Hu, Liang
    KNOWLEDGE-BASED SYSTEMS, 2021, 223