Sentence Selection Strategies for Distilling Word Embeddings from BERT

被引:0
|
作者
Wang, Yixiao [1 ]
Bouraoui, Zied [2 ]
Espinosa-Anke, Luis [1 ]
Schockaert, Steven [1 ]
机构
[1] Cardiff Univ, Cardiff, S Glam, Wales
[2] Univ Artois, CNRS, CRIL, Arras, France
基金
英国工程与自然科学研究理事会;
关键词
Word Embeddings; Language Models; Natural Language Processing;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Many applications crucially rely on the availability of high-quality word vectors. To learn such representations, several strategies based on language models have been proposed in recent years. While effective, these methods typically rely on a large number of contextualised vectors for each word, which makes them impractical. In this paper, we investigate whether similar results can be obtained when only a few contextualised representations of each word can be used. To this end, we analyze a range of strategies for selecting the most informative sentences. Our results show that with a careful selection strategy, high-quality word vectors can be learned from as few as 5 to 10 sentences.
引用
收藏
页码:2591 / 2600
页数:10
相关论文
共 50 条
  • [41] Automated disease cohort selection using word embeddings from Electronic Health Records
    Glicksberg, Benjamin S.
    Miotto, Riccardo
    Johnson, Kipp W.
    Shameer, Khader
    Li, Li
    Chen, Rong
    Dudley, Joel T.
    PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018 (PSB), 2018, : 145 - 156
  • [42] Distilling Relation Embeddings from Pre-trained Language Models
    Ushio, Asahi
    Camacho-Collados, Jose
    Schockaert, Steven
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 9044 - 9062
  • [43] Distilling Content from Style for Handwritten Word Recognition
    Kang, Lei
    Riba, Pau
    Rusinol, Marcal
    Fornes, Alicia
    Villegas, Mauricio
    2020 17TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2020), 2020, : 139 - 144
  • [44] Probing word and sentence embeddings for long-distance dependencies effects in French and English
    Merlo, Paola
    BLACKBOXNLP WORKSHOP ON ANALYZING AND INTERPRETING NEURAL NETWORKS FOR NLP AT ACL 2019, 2019, : 158 - 172
  • [45] Quality Estimation and Translation Metrics via Pre-trained Word and Sentence Embeddings
    Yankovskaya, Elizaveta
    Tattar, Andre
    Fishel, Mark
    FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 3: SHARED TASK PAPERS, DAY 2, 2019, : 101 - 105
  • [46] Using BERT Embeddings to Model Word Importance in Conversational Transcripts for Deaf and Hard of Hearing Users
    Al Amin, Akhter
    Hassan, Saad
    Alm, Cecilia O.
    Huenerfauth, Matt
    PROCEEDINGS OF THE SECOND WORKSHOP ON LANGUAGE TECHNOLOGY FOR EQUALITY, DIVERSITY AND INCLUSION (LTEDI 2022), 2022, : 35 - 40
  • [47] From Word Embeddings To Document Distances
    Kusner, Matt J.
    Sun, Yu
    Kolkin, Nicholas I.
    Weinberger, Kilian Q.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 957 - 966
  • [48] Sentence Similarity Techniques for Short vs Variable Length Text using Word Embeddings
    Shashavali, D.
    Vishwjeet, V.
    Kumar, Rahul
    Mathur, Gaurav
    Nihal, Nikhil
    Mukherjee, Siddhartha
    Patil, Suresh Venkanagouda
    COMPUTACION Y SISTEMAS, 2019, 23 (03): : 999 - 1004
  • [49] MicroBERT: Distilling MoE-Based Knowledge from BERT into a Lighter Model
    Zheng, Dashun
    Li, Jiaxuan
    Yang, Yunchu
    Wang, Yapeng
    Pang, Patrick Cheong-Iao
    APPLIED SCIENCES-BASEL, 2024, 14 (14):
  • [50] Smart Gesture Selection with Word Embeddings Applied to NAO Robot
    Almagro-Cadiz, Mario
    Fresno, Victor
    de la Paz Lopez, Felix
    BIOMEDICAL APPLICATIONS BASED ON NATURAL AND ARTIFICIAL COMPUTING, PT II, 2017, 10338 : 167 - 179