Sentence Selection Strategies for Distilling Word Embeddings from BERT

被引:0
|
作者
Wang, Yixiao [1 ]
Bouraoui, Zied [2 ]
Espinosa-Anke, Luis [1 ]
Schockaert, Steven [1 ]
机构
[1] Cardiff Univ, Cardiff, S Glam, Wales
[2] Univ Artois, CNRS, CRIL, Arras, France
基金
英国工程与自然科学研究理事会;
关键词
Word Embeddings; Language Models; Natural Language Processing;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Many applications crucially rely on the availability of high-quality word vectors. To learn such representations, several strategies based on language models have been proposed in recent years. While effective, these methods typically rely on a large number of contextualised vectors for each word, which makes them impractical. In this paper, we investigate whether similar results can be obtained when only a few contextualised representations of each word can be used. To this end, we analyze a range of strategies for selecting the most informative sentences. Our results show that with a careful selection strategy, high-quality word vectors can be learned from as few as 5 to 10 sentences.
引用
收藏
页码:2591 / 2600
页数:10
相关论文
共 50 条
  • [1] Carrier Sentence Selection with Word and Context Embeddings
    Yeung, Chak Yan
    Lee, John
    Tsou, Benjamin
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 439 - 444
  • [2] Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
    Reimers, Nils
    Gurevych, Iryna
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 3982 - 3992
  • [3] Distilling Word Embeddings: An Encoding Approach
    Mou, Lili
    Jia, Ran
    Xu, Yan
    Li, Ge
    Zhang, Lu
    Jin, Zhi
    CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 1977 - 1980
  • [4] Scholarly Text Classification with Sentence BERT and Entity Embeddings
    Piao, Guangyuan
    TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, 2021, 12705 : 79 - 87
  • [5] BERT Has More to Offer: BERT Layers Combination Yields Better Sentence Embeddings
    Hosseini, MohammadSaleh
    Munia, Munawara Saiyara
    Khan, Latifur
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 15419 - 15431
  • [6] Contextualized BERT Sentence Embeddings for Author Profiling: The Cost of Performances
    Polignano, Marco
    de Gemmis, Marco
    Semeraro, Giovanni
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, ICCSA 2020, PART IV, 2020, 12252 : 135 - 149
  • [7] Single document summarization using word and sentence embeddings
    Ayana
    PROCEEDINGS OF THE 2015 JOINT INTERNATIONAL MECHANICAL, ELECTRONIC AND INFORMATION TECHNOLOGY CONFERENCE (JIMET 2015), 2015, 10 : 523 - 526
  • [8] Siamese CBOW: Optimizing Word Embeddings for Sentence Representations
    Kenter, Tom
    Borisov, Alexey
    de Rijke, Maarten
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 941 - 951
  • [9] Capturing Word Order in Averaging Based Sentence Embeddings
    Lee, Jae Hee
    Camacho-Collados, Jose
    Anke, Luis Espinosa
    Schockaert, Steven
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 2062 - 2069
  • [10] Language with vision: A study on grounded word and sentence embeddings
    Shahmohammadi, Hassan
    Heitmeier, Maria
    Shafaei-Bajestan, Elnaz
    Lensch, Hendrik P. A.
    Baayen, R. Harald
    BEHAVIOR RESEARCH METHODS, 2024, 56 (06) : 5622 - 5646