The Impact of Specialized Corpora for Word Embeddings in Natural Langage Understanding

被引:5
|
作者
Neuraz, Antoine [1 ,2 ]
Rance, Bastien [1 ]
Garcelon, Nicolas [1 ]
Llanos, Leonardo Campillos [2 ]
Burgun, Anita [1 ]
Rosset, Sophie [2 ]
机构
[1] Paris Descartes, UMR 1138, INSERM, Team 22, Paris, France
[2] Univ Paris Saclay, CNRS, LIMSI, Paris, France
来源
DIGITAL PERSONALIZED HEALTH AND MEDICINE | 2020年 / 270卷
关键词
Natural Language processing; Contextual word embeddings; Natural language understanding;
D O I
10.3233/SHTI200197
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Recent studies in the biomedical domain suggest that learning statistical word representations (static or contextualized word embeddings) on large corpora of specialized data improve the results on downstream natural language processing (NLP) tasks. In this paper, we explore the impact of the data source of word representations on a natural language understanding task. We compared embeddings learned with Fasttext (static embedding) and ELMo (contextualized embedding) representations, learned either on the general domain (Wikipedia) or on specialized data (electronic health records, EHR). The best results were obtained with ELMo representations learned on EHR data for the two sub-tasks (+7% and + 4% of gain in F1-score). Moreover, ELMo representations were trained with only a fraction of the data used for Fasttext.
引用
收藏
页码:432 / 436
页数:5
相关论文
共 50 条
  • [31] Domain specific word embeddings for natural language processing in radiology
    Chen, Timothy L.
    Emerling, Max
    Chaudhari, Gunvant R.
    Chillakuru, Yeshwant R.
    Seo, Youngho
    Vu, Thienkhai H.
    Sohn, Jae Ho
    JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 113
  • [32] Gender Stereotypes in Natural Language: Word Embeddings Show Robust Consistency Across Child and Adult Language Corpora of More Than 65 Million Words
    Charlesworth, Tessa E. S.
    Yang, Victor
    Mann, Thomas C.
    Kurdi, Benedek
    Banaji, Mahzarin R.
    PSYCHOLOGICAL SCIENCE, 2021, 32 (02) : 218 - 240
  • [33] Vietnamese Antonyms Detection Based on Specialized Word Embeddings using Semantic Knowledge and Distributional Information
    Van-Tan Bui
    Khac-Quy Dinht
    Phuong-Thai Nguyen
    2020 12TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (IEEE KSE 2020), 2020, : 159 - 164
  • [34] Word Embeddings Reveal How Fundamental Sentiments Structure Natural Language
    van Loon, Austin
    Freese, Jeremy
    AMERICAN BEHAVIORAL SCIENTIST, 2023, 67 (02) : 175 - 200
  • [35] Enriching Word Embeddings with Fuzzy Systems for Natural Language Processing Tasks
    Seth, Taniya
    Muhuri, Pranab K.
    2024 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, FUZZ-IEEE 2024, 2024,
  • [36] Impact of Cultural Connotation on Word Understanding
    黄伟
    读与写(教育教学刊), 2012, 9 (10) : 3 - 5
  • [37] JOINT LEARNING OF WORD AND LABEL EMBEDDINGS FOR SEQUENCE LABELLING IN SPOKEN LANGUAGE UNDERSTANDING
    Wu, Jiewen
    D'Haro, Luis Fernando
    Chen, Nancy F.
    Krishnaswamy, Pavitra
    Banchs, Rafael E.
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 800 - 806
  • [38] Understanding the Semantic Content of Sparse Word Embeddings Using a Commonsense Knowledge Base
    Balogh, Vanda
    Berend, Gabor
    Diochnos, Dimitrios, I
    Turan, Gyorgy
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 7399 - 7406
  • [39] Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition
    Unanue, Inigo Jauregi
    Borzeshi, Ehsan Zare
    Piccardi, Massimo
    JOURNAL OF BIOMEDICAL INFORMATICS, 2017, 76 : 102 - 109
  • [40] Application of specialized word embeddings and named entity and attribute recognition to the problem of unsupervised automated clinical coding
    Nath, Namrata
    Lee, Sang-Heon
    Lee, Ivan
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 165