BioSentVec: creating sentence embeddings for biomedical texts

被引:0
|
作者
Chen, Qingyu [1 ]
Peng, Yifan [1 ]
Lu, Zhiyong [1 ]
机构
[1] Natl Ctr Biotechnol Informat NCBI, Natl Inst Hlth NIH, Natl Lib Med NLM, 8600 Rockville Pike, Bethesda, MD 20894 USA
关键词
Biomedical Text Mining; Sentence Embeddings; HALLMARKS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sentence embeddings have become an essential part of today's natural language processing (NLP) systems, especially together advanced deep learning methods. Although pre-trained sentence encoders are available in the general domain, none exists for biomedical texts to date. In this work, we introduce BioSentVec: the first open set of sentence embeddings trained with over 30 million documents from both scholarly articles in PubMed and clinical notes in the MIMIC III Clinical Database. We evaluate BioSentVec embeddings in two sentence pair similarity tasks in different biomedical text genres. Our benchmarking results demonstrate that the BioSentVec embeddings can better capture sentence semantics compared to the other competitive alternatives and achieve state-of-the-art performance in both tasks. We expect BioSentVec to facilitate the research and development in biomedical text mining and to complement the existing resources in biomedical word embeddings. The embeddings are publicly available at https://github.comincbi-nlp/BioSentVec.
引用
收藏
页码:246 / 250
页数:5
相关论文
共 50 条
  • [31] MCSE: Multimodal Contrastive Learning of Sentence Embeddings
    Zhang, Miaoran
    Mosbach, Marius
    Adelani, David Ifeoluwa
    Hedderich, Michael A.
    Klakow, Dietrich
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 5959 - 5969
  • [32] Carrier Sentence Selection with Word and Context Embeddings
    Yeung, Chak Yan
    Lee, John
    Tsou, Benjamin
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 439 - 444
  • [33] Conceptual Sentence Embeddings Based on Attention Mechanism
    Wang Y.-S.
    Huang H.-Y.
    Feng C.
    Zhou Q.
    Zidonghua Xuebao/Acta Automatica Sinica, 2020, 46 (07): : 1390 - 1400
  • [34] DistillCSE: Distilled Contrastive Learning for Sentence Embeddings
    Xu, Jiahao
    Shao, Wei
    Chen, Lihui
    Liu, Lemao
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 8153 - 8165
  • [35] The Impact of Sentence Embeddings in Turkish Paraphrase Detection
    Karaoglan, Bahar
    Yorgancioglu, Hakki Engin
    Kisla, Tarik
    Kumova Metin, Senem
    2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
  • [36] Sentence embeddings in NLI with iterative refinement encoders
    Talman, Aarne
    Yli-Jyra, Anssi
    Tiedemann, Joerg
    NATURAL LANGUAGE ENGINEERING, 2019, 25 (04) : 467 - 482
  • [37] Sentence-level Privacy for Document Embeddings
    Meehan, Casey
    Mrini, Khalil
    Chaudhuri, Kamalika
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3367 - 3380
  • [38] Comparative Study of Sentence Embeddings for Contextual Paraphrasing
    Pragst, Louisa
    Minker, Wolfgang
    Ultes, Stefan
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6841 - 6851
  • [39] DefSent: Sentence Embeddings using Definition Sentences
    Tsukagoshi, Hayato
    Sasano, Ryohei
    Takeda, Koichi
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 411 - 418
  • [40] How doWords Contribute to Sentence Semantics? Revisiting Sentence Embeddings with a Perturbation Method
    Yao, Wenlin
    Jin, Lifeng
    Zhang, Hongming
    Pan, Xiaoman
    Song, Kaiqiang
    Yu, Dian
    Yu, Dong
    Chen, Jianshu
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 3001 - 3010