The Impact of Specialized Corpora for Word Embeddings in Natural Langage Understanding

被引:5
|
作者
Neuraz, Antoine [1 ,2 ]
Rance, Bastien [1 ]
Garcelon, Nicolas [1 ]
Llanos, Leonardo Campillos [2 ]
Burgun, Anita [1 ]
Rosset, Sophie [2 ]
机构
[1] Paris Descartes, UMR 1138, INSERM, Team 22, Paris, France
[2] Univ Paris Saclay, CNRS, LIMSI, Paris, France
来源
DIGITAL PERSONALIZED HEALTH AND MEDICINE | 2020年 / 270卷
关键词
Natural Language processing; Contextual word embeddings; Natural language understanding;
D O I
10.3233/SHTI200197
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Recent studies in the biomedical domain suggest that learning statistical word representations (static or contextualized word embeddings) on large corpora of specialized data improve the results on downstream natural language processing (NLP) tasks. In this paper, we explore the impact of the data source of word representations on a natural language understanding task. We compared embeddings learned with Fasttext (static embedding) and ELMo (contextualized embedding) representations, learned either on the general domain (Wikipedia) or on specialized data (electronic health records, EHR). The best results were obtained with ELMo representations learned on EHR data for the two sub-tasks (+7% and + 4% of gain in F1-score). Moreover, ELMo representations were trained with only a fraction of the data used for Fasttext.
引用
收藏
页码:432 / 436
页数:5
相关论文
共 50 条
  • [41] The impact of using pre-trained word embeddings in Sinhala chatbots
    Gamage, Bimsara
    Pushpananda, Randil
    Weerasinghe, Ruvan
    2020 20TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER-2020), 2020, : 161 - 165
  • [42] Investigating the Frequency Distortion of Word Embeddings and Its Impact on Bias Metrics
    Valentini, Francisco
    Sosa, Juan Cruz
    Slezak, Diego Fernandez
    Altszyler, Edgar
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 113 - 126
  • [43] What Does This Word Mean? Explaining Contextualized Embeddings with Natural Language Definition
    Chang, Ting-Yun
    Chen, Yun-Nung
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 6064 - 6070
  • [44] Computationally Efficient Learning of Quality Controlled Word Embeddings for Natural Language Processing
    Alawad, Mohammed
    Tourassi, Georgia
    2019 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2019), 2019, : 134 - 139
  • [45] Towards Understanding the Impact of Graph Structure on Knowledge Graph Embeddings
    Dave, Brandon
    Christou, Antrea
    Shimizu, Cogan
    NEURAL-SYMBOLIC LEARNING AND REASONING, PT II, NESY 2024, 2024, 14980 : 41 - 50
  • [46] Combining entity co-occurrence with specialized word embeddings to measure entity relation in Alzheimer’s disease
    Go Eun Heo
    Qing Xie
    Min Song
    Jeong-Hoon Lee
    BMC Medical Informatics and Decision Making, 19
  • [47] Improving the learning of chemical-protein interactions from literature using transfer learning and specialized word embeddings
    Corbett, P.
    Boyle, J.
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2018,
  • [48] Combining entity co-occurrence with specialized word embeddings to measure entity relation in Alzheimer's disease
    Heo, Go Eun
    Xie, Qing
    Song, Min
    Lee, Jeong-Hoon
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2019, 19 (01)
  • [49] Validating word lists that represent learner knowledge in EFL contexts: The impact of the definition of word and the choice of source corpora
    Pinchbeck, Geoffrey G.
    Brown, Dale
    Mclean, Stuart
    Kramer, Brandon
    SYSTEM, 2022, 106
  • [50] Size Matters: The Impact of Training Size in Taxonomically-Enriched Word Embeddings
    Maldonado, Alfredo
    Klubicka, Filip
    Kelleher, John
    OPEN COMPUTER SCIENCE, 2019, 9 (01): : 252 - 267