Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

被引:0
|
作者
Hirota, Wataru [1 ,2 ]
Suhara, Yoshihiko [2 ]
Golshan, Behzad [2 ]
Tan, Wang-Chiew [2 ]
机构
[1] Osaka Univ, Suita, Osaka, Japan
[2] Megagon Labs, Mountain View, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present EMU, a system that semantically enhances multilingual sentence embeddings. Our framework fine-tunes pre-trained multilingual sentence embeddings using two main components: a semantic classifier and a language discriminator. The semantic classifier improves the semantic similarity of related sentences, whereas the language discriminator enhances the multilinguality of the embeddings via multilingual adversarial training. Our experimental results based on several language pairs show that our specialized embeddings outperform the state-of-the-art multilingual sentence embedding model on the task of cross-lingual intent classification using only monolingual labeled data.
引用
下载
收藏
页码:7935 / 7943
页数:9
相关论文
共 50 条
  • [1] Are the Best Multilingual Document Embeddings simply Based on Sentence Embeddings?
    Sannigrahi, Sonal
    van Genabith, Josef
    Espana-Bonet, Cristina
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 2306 - 2316
  • [2] Sequential Sentence Embeddings for Semantic Similarity
    Carta, Antonio
    Bacciu, Davide
    2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 1354 - 1361
  • [3] Exploring Semantic Properties of Sentence Embeddings
    Zhu, Xunjie
    Li, Tingfeng
    de Melo, Gerard
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2018, : 632 - 637
  • [4] Enhancing Cross-lingual Semantic Annotations using Deep Network Sentence Embeddings
    Lin, Ying-Chi
    Hoffmann, Phillip
    Rahm, Erhard
    HEALTHINF: PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES - VOL. 5: HEALTHINF, 2021, : 188 - 199
  • [5] Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining
    Kvapilikova, Ivana
    Artetxe, Mikel
    Labaka, Gorka
    Agirre, Eneko
    Bojar, Ondrej
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020): STUDENT RESEARCH WORKSHOP, 2020, : 255 - 262
  • [6] Learning Multilingual Sentence Embeddings from Monolingual Corpus
    Wang, Shuai
    Hou, Lei
    Li, Juanzi
    Tong, Meihan
    Jiang, Jiabo
    CHINESE COMPUTATIONAL LINGUISTICS, CCL 2019, 2019, 11856 : 346 - 357
  • [7] SEMANTIC SENTENCE EMBEDDINGS FOR PARAPHRASING AND TEXT SUMMARIZATION
    Zhang, Chi
    Sah, Shagan
    Thang Nguyen
    Peri, Dheeraj
    Loui, Alexander
    Salvaggio, Carl
    Ptucha, Raymond
    2017 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2017), 2017, : 705 - 709
  • [8] Multilingual Universal Sentence Encoder for Semantic Retrieval
    Yang, Yinfei
    Cer, Daniel
    Ahmad, Amin
    Guo, Mandy
    Law, Jax
    Constant, Noah
    Abrego, Gustavo Hernandez
    Yuan, Steve
    Tar, Chris
    Sung, Yun-Hsuan
    Strope, Brian
    Kurzweil, Ray
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020): SYSTEM DEMONSTRATIONS, 2020, : 87 - 94
  • [9] Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation
    Reimers, Nils
    Gurevych, Iryna
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 4512 - 4525
  • [10] Semantic Alignment with Calibrated Similarity for Multilingual Sentence Embedding
    Ham, Jiyeon
    Kim, Eun-Sol
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1781 - 1791