Quality of word and concept embeddings in targetted biomedical domains

被引:0
|
作者
Giancani, Salvatore [1 ,2 ,3 ]
Albertoni, Riccardo [3 ]
Catalano, Chiara Eva [3 ]
机构
[1] CNRS, Inst Neurosci Timone, Unite Mixte Rech 7289, 27 Blvd Jean Moulin, F-13385 Marseille 05, France
[2] Aix Marseille Univ, Fac Med, 27 Blvd Jean Moulin, F-13385 Marseille 05, France
[3] CNR, Ist Matemat Applicata & Tecnol Informat, Via Marini 16, I-16149 Genoa, Italy
关键词
Embedding; Quality; UMLS; Coverage; Chronic obstructive pulmonary disease; SYSTEM; RELATEDNESS; SIMILARITY; UMLS; TEXT;
D O I
10.1016/j.heliyon.2023.e16818
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Embeddings are fundamental resources often reused for building intelligent systems in the biomedical context. As a result, evaluating the quality of previously trained embeddings and ensuring they cover the desired information is critical for the success of applications. This paper proposes a new evaluation methodology to test the coverage of embeddings against a targetted domain of interest. It defines measures to assess the terminology, similarity, and analogy coverage, which are core aspects of the embeddings. Then, it discusses the experimentation carried out on existing biomedical embeddings in the specific context of pulmonary diseases. The proposed methodology and measures are general and may be applied to any application domain.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Biomedical Word Sense Disambiguation with Word Embeddings
    Antunes, Rui
    Matos, Sergio
    11TH INTERNATIONAL CONFERENCE ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY & BIOINFORMATICS, 2017, 616 : 273 - 279
  • [2] Knowledge-Based Biomedical Word Sense Disambiguation with Neural Concept Embeddings
    Sabbir, A. K. M.
    Jimeno-Yepes, Antonio
    Kavuluru, Ramakanth
    2017 IEEE 17TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2017, : 163 - 170
  • [3] Improved biomedical word embeddings in the transformer era
    Noh, Jiho
    Kavuluru, Ramakanth
    JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 120 (120)
  • [4] Biomedical Semantic Embeddings: Using hybrid sentences to construct biomedical word embeddings and its applications
    Shaik, Arshad
    Jin, Wei
    2019 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), 2019,
  • [5] Biomedical entities recognition in Spanish combining word embeddings
    Lopez-Ubeda, Pilar
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2022, (68): : 149 - 152
  • [6] Word embeddings for biomedical natural language processing: A survey
    Chiu, Billy
    Baker, Simon
    LANGUAGE AND LINGUISTICS COMPASS, 2020, 14 (12):
  • [7] Enhancing biomedical word embeddings by retrofitting to verb clusters
    Chiu, Billy
    Baker, Simon
    Palmer, Martha
    Korhonen, Anna
    SIGBIOMED WORKSHOP ON BIOMEDICAL NATURAL LANGUAGE PROCESSING (BIONLP 2019), 2019, : 125 - 134
  • [8] A comparison of word embeddings for the biomedical natural language processing
    Wang, Yanshan
    Liu, Sijia
    Afzal, Naveed
    Rastegar-Mojarad, Majid
    Wang, Liwei
    Shen, Feichen
    Kingsbury, Paul
    Liu, Hongfang
    JOURNAL OF BIOMEDICAL INFORMATICS, 2018, 87 : 12 - 20
  • [9] BioWordVec, improving biomedical word embeddings with subword information and MeSH
    Zhang, Yijia
    Chen, Qingyu
    Yang, Zhihao
    Lin, Hongfei
    Lu, Zhiyong
    SCIENTIFIC DATA, 2019, 6 (1)
  • [10] BioWordVec, improving biomedical word embeddings with subword information and MeSH
    Yijia Zhang
    Qingyu Chen
    Zhihao Yang
    Hongfei Lin
    Zhiyong Lu
    Scientific Data, 6