BioSentVec: creating sentence embeddings for biomedical texts

被引:0
|
作者
Chen, Qingyu [1 ]
Peng, Yifan [1 ]
Lu, Zhiyong [1 ]
机构
[1] Natl Ctr Biotechnol Informat NCBI, Natl Inst Hlth NIH, Natl Lib Med NLM, 8600 Rockville Pike, Bethesda, MD 20894 USA
关键词
Biomedical Text Mining; Sentence Embeddings; HALLMARKS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sentence embeddings have become an essential part of today's natural language processing (NLP) systems, especially together advanced deep learning methods. Although pre-trained sentence encoders are available in the general domain, none exists for biomedical texts to date. In this work, we introduce BioSentVec: the first open set of sentence embeddings trained with over 30 million documents from both scholarly articles in PubMed and clinical notes in the MIMIC III Clinical Database. We evaluate BioSentVec embeddings in two sentence pair similarity tasks in different biomedical text genres. Our benchmarking results demonstrate that the BioSentVec embeddings can better capture sentence semantics compared to the other competitive alternatives and achieve state-of-the-art performance in both tasks. We expect BioSentVec to facilitate the research and development in biomedical text mining and to complement the existing resources in biomedical word embeddings. The embeddings are publicly available at https://github.comincbi-nlp/BioSentVec.
引用
收藏
页码:246 / 250
页数:5
相关论文
共 50 条
  • [1] Sentence representation with manifold learning for biomedical texts
    Zhao, Di
    Wang, Jian
    Lin, Hongfei
    Chu, Yonghe
    Wang, Yan
    Zhang, Yijia
    Yang, Zhihao
    KNOWLEDGE-BASED SYSTEMS, 2021, 218
  • [2] Creating subjective and objective sentence classifiers from unannotated texts
    Wiebe, J
    Riloff, E
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2005, 3406 : 486 - 497
  • [3] Relation Extraction in Biomedical Texts: A Cross-Sentence Approach
    Li, Zhijing
    Tian, Liwei
    Jiang, Yiping
    Huang, Yucheng
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2024, 21 (06) : 2156 - 2166
  • [4] Cross-Lingual Classification of Political Texts Using Multilingual Sentence Embeddings
    Licht, Hauke
    POLITICAL ANALYSIS, 2023, 31 (03) : 366 - 379
  • [5] On the Dimensionality of Sentence Embeddings
    Wang, Hongwei
    Zhang, Hongming
    Yu, Dong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 10344 - 10354
  • [6] Conceptual Sentence Embeddings
    Wang, Yashen
    Huang, Heyan
    Feng, Chong
    Zhou, Qiang
    Gu, Jiahui
    WEB-AGE INFORMATION MANAGEMENT, PT I, 2016, 9658 : 390 - 401
  • [7] Biomedical Domain-Oriented Word Embeddings via Small Background Texts for Biomedical Text Mining Tasks
    Li, Lishuang
    Wan, Jia
    Huang, Degen
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2017, 2018, 10619 : 554 - 564
  • [8] Efficient comparison of sentence embeddings
    Zoupanos, Spyros
    Kolovos, Stratis
    Kanavos, Athanasios
    Papadimitriou, Orestis
    Maragoudakis, Manolis
    PROCEEDINGS OF THE 12TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE, SETN 2022, 2022,
  • [9] Inter-sentence Relation Extraction for Associating Biological Context with Events in Biomedical Texts
    Noriega-Atala, Enrique
    Hein, Paul D.
    Thumsi, Shraddha S.
    Wong, Zechy
    Wang, Xia
    Morrison, Clayton T.
    2018 18TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2018, : 722 - 731
  • [10] Extracting Inter-Sentence Relations for Associating Biological Context with Events in Biomedical Texts
    Noriega-Atala, Enrique
    Hein, Paul D.
    Thumsi, Shraddha S.
    Wong, Zechy
    Wang, Xia
    Hendryx, Sean M.
    Morrison, Clayton T.
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2020, 17 (06) : 1895 - 1906