Sentence Selection Strategies for Distilling Word Embeddings from BERT

被引:0
|
作者
Wang, Yixiao [1 ]
Bouraoui, Zied [2 ]
Espinosa-Anke, Luis [1 ]
Schockaert, Steven [1 ]
机构
[1] Cardiff Univ, Cardiff, S Glam, Wales
[2] Univ Artois, CNRS, CRIL, Arras, France
基金
英国工程与自然科学研究理事会;
关键词
Word Embeddings; Language Models; Natural Language Processing;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Many applications crucially rely on the availability of high-quality word vectors. To learn such representations, several strategies based on language models have been proposed in recent years. While effective, these methods typically rely on a large number of contextualised vectors for each word, which makes them impractical. In this paper, we investigate whether similar results can be obtained when only a few contextualised representations of each word can be used. To this end, we analyze a range of strategies for selecting the most informative sentences. Our results show that with a careful selection strategy, high-quality word vectors can be learned from as few as 5 to 10 sentences.
引用
收藏
页码:2591 / 2600
页数:10
相关论文
共 50 条
  • [21] Learning Word and Sentence Embeddings Using a Generative Convolutional Network
    Vargas-Ocampo, Edgar
    Roman-Rangel, Edgar
    Hermosillo-Valadez, Jorge
    PATTERN RECOGNITION, 2018, 10880 : 135 - 144
  • [22] Developing a sentence level fairness metric using word embeddings
    Ahmed Izzidien
    Stephen Fitz
    Peter Romero
    Bao S. Loe
    David Stillwell
    International Journal of Digital Humanities, 2023, 5 (2-3) : 95 - 130
  • [23] A Bidirectional LSTM Approach with Word Embeddings for Sentence Boundary Detection
    Chenglin Xu
    Lei Xie
    Xiong Xiao
    Journal of Signal Processing Systems, 2018, 90 : 1063 - 1075
  • [24] Sentence-Level Sentiment Analysis Using Feature Vectors from Word Embeddings
    Hayashi, Toshitaka
    Fujita, Hamido
    NEW TRENDS IN INTELLIGENT SOFTWARE METHODOLOGIES, TOOLS AND TECHNIQUES (SOMET_18), 2018, 303 : 749 - 758
  • [25] A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments
    Levy, Omer
    Sogaard, Anders
    Goldberg, Yoav
    15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 765 - 774
  • [26] On Character vs Word Embeddings as Input for English Sentence Classification
    Hammerton, James
    Vintro, Merce
    Kapetanakis, Stelios
    Sama, Michele
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, 2019, 868 : 550 - 566
  • [27] What do BERT word embeddings learn about the French language?
    Goliakova, Ekaterina
    Langlois, David
    PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE COMPUTATIONAL LINGUISTICS IN BULGARIA, CLIB 2024, 2024, : 14 - 32
  • [28] A Study on the Relevance of Generic Word Embeddings for Sentence Classification in Hepatic Surgery
    Oukelmoun, Achir
    Semmar, Nasredine
    de Chalendar, Gael
    Habran, Enguerrand
    Vibert, Eric
    Goblet, Emma
    Oukelmoun, Mariame
    Allard, Marc-Antoine
    2023 20TH ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, AICCSA, 2023,
  • [29] Performance Evaluation of Word and Sentence Embeddings for Finance Headlines Sentiment Analysis
    Mishev, Kostadin
    Gjorgjevikj, Ana
    Stojanov, Riste
    Mishkovski, Igor
    Vodenska, Irena
    Chitkushev, Ljubomir
    Trajanov, Dimitar
    ICT INNOVATIONS 2019: BIG DATA PROCESSING AND MINING, 2019, 1110 : 161 - 172
  • [30] Improving Implicit Stance Classification in Tweets Using Word and Sentence Embeddings
    Schaefer, Robin
    Stede, Manfred
    ADVANCES IN ARTIFICIAL INTELLIGENCE, KI 2019, 2019, 11793 : 299 - 307