Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding

被引:0
|
作者
Barkan, Oren [1 ]
Razin, Noam [1 ,2 ]
Malkiel, Itzik [1 ,2 ]
Katz, Ori [1 ,3 ]
Caciularu, Avi [1 ,4 ]
Koenigstein, Noam [1 ,2 ]
机构
[1] Microsoft, Redmond, WA 98052 USA
[2] Tel Aviv Univ, Tel Aviv, Israel
[3] Technion, Haifa, Israel
[4] Bar Ilan Univ, Ramat Gan, Israel
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent state-of-the-art natural language understanding models, such as BERT and XLNet, score a pair of sentences (A and B) using multiple cross-attention operations - a process in which each word in sentence A attends to all words in sentence B and vice versa. As a result, computing the similarity between a query sentence and a set of candidate sentences, requires the propagation of all query-candidate sentence-pairs throughout a stack of cross-attention layers. This exhaustive process becomes computationally prohibitive when the number of candidate sentences is large. In contrast, sentence embedding techniques learn a sentence-to-vector mapping and compute the similarity between the sentence vectors via simple elementary operations. In this paper, we introduce Distilled Sentence Embedding (DSE) - a model that is based on knowledge distillation from cross-attentive models, focusing on sentence-pair tasks. The outline of DSE is as follows: Given a cross-attentive teacher model (e.g. a fine-tuned BERT), we train a sentence embedding based student model to reconstruct the sentence-pair scores obtained by the teacher model. We empirically demonstrate the effectiveness of DSE on five GLUE sentence-pair tasks. DSE significantly outperforms several ELMO variants and other sentence embedding methods, while accelerating computation of the query-candidate sentence-pairs similarities by several orders of magnitude, with an average relative degradation of 4.6% compared to BERT. Furthermore, we show that DSE produces sentence embeddings that reach state-of-the-art performance on universal sentence representation benchmarks. Our code is made publicly available at https://github.com/microsoft/DistilledSentence-Embedding.
引用
收藏
页码:3235 / 3242
页数:8
相关论文
共 38 条
  • [1] Learning Directional Sentence-Pair Embedding for Natural Language Reasoning (Student Abstract)
    Jiang, Yuchen
    Xiao, Zhenxin
    Chang, Kai-Wei
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13825 - 13826
  • [2] An Exploration of Sentence-Pair Classification for Algorithmic Recruiting
    Kaya, Mesut
    Bogers, Toine
    [J]. PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023, 2023, : 1175 - 1179
  • [3] Enhanced attentive convolutional neural networks for sentence pair modeling
    Xu, Shiyao
    Shijia, E.
    Xiang, Yang
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2020, 151
  • [4] Dual-View Distilled BERT for Sentence Embedding
    Cheng, Xingyi
    [J]. SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 2151 - 2155
  • [5] SELF-ATTENTIVE SENTIMENTAL SENTENCE EMBEDDING FOR SENTIMENT ANALYSIS
    Lin, Sheng-Chieh
    Su, Wen-Yuh
    Chien, Po-Chuan
    Tsai, Ming-Feng
    Wang, Chuan-Ju
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 1678 - 1682
  • [6] Iterative Sentence-Pair Extraction from Quasi-Parallel Corpora for Machine Translation
    Sarikaya, R.
    Maskey, S.
    Zhang, R.
    Jan, E.
    Wang, D.
    Ramabhadran, B.
    Roukos, S.
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 432 - 435
  • [7] Fast and scalable neural embedding models for biomedical sentence classification
    Agibetov, Asan
    Blagec, Kathrin
    Xu, Hong
    Samwald, Matthias
    [J]. BMC BIOINFORMATICS, 2018, 19
  • [8] Fast and scalable neural embedding models for biomedical sentence classification
    Asan Agibetov
    Kathrin Blagec
    Hong Xu
    Matthias Samwald
    [J]. BMC Bioinformatics, 19
  • [9] Reviewer assignment based on sentence pair modeling
    Duan, Zhen
    Tan, Shicheng
    Zhao, Shu
    Wang, Qianqian
    Chen, Jie
    Zhang, Yanping
    [J]. NEUROCOMPUTING, 2019, 366 : 97 - 108
  • [10] Sentence Pair Similarity Modeling Based on Weighted Interaction of Multi-semantic Embedding Matrix
    Chen, Junyu
    Zhu, Xiaohong
    Sang, Jun
    Gong, Lu
    [J]. 2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2020, : 1118 - 1123