The Impact of Specialized Corpora for Word Embeddings in Natural Langage Understanding

被引:5
|
作者
Neuraz, Antoine [1 ,2 ]
Rance, Bastien [1 ]
Garcelon, Nicolas [1 ]
Llanos, Leonardo Campillos [2 ]
Burgun, Anita [1 ]
Rosset, Sophie [2 ]
机构
[1] Paris Descartes, UMR 1138, INSERM, Team 22, Paris, France
[2] Univ Paris Saclay, CNRS, LIMSI, Paris, France
来源
关键词
Natural Language processing; Contextual word embeddings; Natural language understanding;
D O I
10.3233/SHTI200197
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Recent studies in the biomedical domain suggest that learning statistical word representations (static or contextualized word embeddings) on large corpora of specialized data improve the results on downstream natural language processing (NLP) tasks. In this paper, we explore the impact of the data source of word representations on a natural language understanding task. We compared embeddings learned with Fasttext (static embedding) and ELMo (contextualized embedding) representations, learned either on the general domain (Wikipedia) or on specialized data (electronic health records, EHR). The best results were obtained with ELMo representations learned on EHR data for the two sub-tasks (+7% and + 4% of gain in F1-score). Moreover, ELMo representations were trained with only a fraction of the data used for Fasttext.
引用
下载
收藏
页码:432 / 436
页数:5
相关论文
共 50 条
  • [41] Computationally Efficient Learning of Quality Controlled Word Embeddings for Natural Language Processing
    Alawad, Mohammed
    Tourassi, Georgia
    2019 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2019), 2019, : 134 - 139
  • [42] Combining entity co-occurrence with specialized word embeddings to measure entity relation in Alzheimer’s disease
    Go Eun Heo
    Qing Xie
    Min Song
    Jeong-Hoon Lee
    BMC Medical Informatics and Decision Making, 19
  • [43] Combining entity co-occurrence with specialized word embeddings to measure entity relation in Alzheimer's disease
    Heo, Go Eun
    Xie, Qing
    Song, Min
    Lee, Jeong-Hoon
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2019, 19 (01)
  • [44] Improving the learning of chemical-protein interactions from literature using transfer learning and specialized word embeddings
    Corbett, P.
    Boyle, J.
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2018,
  • [45] Validating word lists that represent learner knowledge in EFL contexts: The impact of the definition of word and the choice of source corpora
    Pinchbeck, Geoffrey G.
    Brown, Dale
    Mclean, Stuart
    Kramer, Brandon
    SYSTEM, 2022, 106
  • [46] Exploring the impact of word embeddings for disjoint semisupervised Spanish verb sense disambiguation
    Cardellino, Cristian
    Alonso Alemany, Laura
    INTELIGENCIA ARTIFICIAL-IBEROAMERICAL JOURNAL OF ARTIFICIAL INTELLIGENCE, 2018, 21 (61): : 67 - 81
  • [47] Size Matters: The Impact of Training Size in Taxonomically-Enriched Word Embeddings
    Maldonado, Alfredo
    Klubicka, Filip
    Kelleher, John
    OPEN COMPUTER SCIENCE, 2019, 9 (01): : 252 - 267
  • [48] BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance
    Schick, Timo
    Schuetze, Hinrich
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3996 - 4007
  • [49] Investigating the Impact of Pre-trained Word Embeddings on Memorization in Neural Networks
    Thomas, Aleena
    Adelani, David Ifeoluwa
    Davody, Ali
    Mogadala, Aditya
    Klakow, Dietrich
    TEXT, SPEECH, AND DIALOGUE (TSD 2020), 2020, 12284 : 273 - 281
  • [50] Action Recognition in Still Images Using Word Embeddings from Natural Language Descriptions
    Sharma, Karan
    Kumar, Arun C. S.
    Bhandarkar, Suchendra M.
    2017 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW), 2017, : 58 - 66