Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding

被引：0

作者：

Barkan, Oren ^{[1
]}

Razin, Noam ^{[1
,2
]}

Malkiel, Itzik ^{[1
,2
]}

Katz, Ori ^{[1
,3
]}

Caciularu, Avi ^{[1
,4
]}

Koenigstein, Noam ^{[1
,2
]}

机构：

[1] Microsoft, Redmond, WA 98052 USA

[2] Tel Aviv Univ, Tel Aviv, Israel

[3] Technion, Haifa, Israel

[4] Bar Ilan Univ, Ramat Gan, Israel

来源：

THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2020年 / 34卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent state-of-the-art natural language understanding models, such as BERT and XLNet, score a pair of sentences (A and B) using multiple cross-attention operations - a process in which each word in sentence A attends to all words in sentence B and vice versa. As a result, computing the similarity between a query sentence and a set of candidate sentences, requires the propagation of all query-candidate sentence-pairs throughout a stack of cross-attention layers. This exhaustive process becomes computationally prohibitive when the number of candidate sentences is large. In contrast, sentence embedding techniques learn a sentence-to-vector mapping and compute the similarity between the sentence vectors via simple elementary operations. In this paper, we introduce Distilled Sentence Embedding (DSE) - a model that is based on knowledge distillation from cross-attentive models, focusing on sentence-pair tasks. The outline of DSE is as follows: Given a cross-attentive teacher model (e.g. a fine-tuned BERT), we train a sentence embedding based student model to reconstruct the sentence-pair scores obtained by the teacher model. We empirically demonstrate the effectiveness of DSE on five GLUE sentence-pair tasks. DSE significantly outperforms several ELMO variants and other sentence embedding methods, while accelerating computation of the query-candidate sentence-pairs similarities by several orders of magnitude, with an average relative degradation of 4.6% compared to BERT. Furthermore, we show that DSE produces sentence embeddings that reach state-of-the-art performance on universal sentence representation benchmarks. Our code is made publicly available at https://github.com/microsoft/DistilledSentence-Embedding.

引用

页码：3235 / 3242

页数：8

共 38 条

[1] Learning Directional Sentence-Pair Embedding for Natural Language Reasoning (Student Abstract)
Jiang, Yuchen
Xiao, Zhenxin
Chang, Kai-Wei
[J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13825 - 13826
[2] An Exploration of Sentence-Pair Classification for Algorithmic Recruiting
Kaya, Mesut
Bogers, Toine
[J]. PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023, 2023, : 1175 - 1179
[3] Enhanced attentive convolutional neural networks for sentence pair modeling
Xu, Shiyao
Shijia, E.
Xiang, Yang
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2020, 151
[4] Dual-View Distilled BERT for Sentence Embedding
Cheng, Xingyi
[J]. SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 2151 - 2155
[5] SELF-ATTENTIVE SENTIMENTAL SENTENCE EMBEDDING FOR SENTIMENT ANALYSIS
Lin, Sheng-Chieh
Su, Wen-Yuh
Chien, Po-Chuan
Tsai, Ming-Feng
Wang, Chuan-Ju
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 1678 - 1682
[6] Iterative Sentence-Pair Extraction from Quasi-Parallel Corpora for Machine Translation
Sarikaya, R.
Maskey, S.
Zhang, R.
Jan, E.
Wang, D.
Ramabhadran, B.
Roukos, S.
[J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 432 - 435
[7] Fast and scalable neural embedding models for biomedical sentence classification
Agibetov, Asan
Blagec, Kathrin
Xu, Hong
Samwald, Matthias
[J]. BMC BIOINFORMATICS, 2018, 19
[8] Fast and scalable neural embedding models for biomedical sentence classification
Asan Agibetov
Kathrin Blagec
Hong Xu
Matthias Samwald
[J]. BMC Bioinformatics, 19
[9] Reviewer assignment based on sentence pair modeling
Duan, Zhen
Tan, Shicheng
Zhao, Shu
Wang, Qianqian
Chen, Jie
Zhang, Yanping
[J]. NEUROCOMPUTING, 2019, 366 : 97 - 108
[10] Sentence Pair Similarity Modeling Based on Weighted Interaction of Multi-semantic Embedding Matrix
Chen, Junyu
Zhu, Xiaohong
Sang, Jun
Gong, Lu
[J]. 2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2020, : 1118 - 1123

← 1 2 3 4 →