IN PURSUIT OF BABEL - MULTILINGUAL END-TO-END SPOKEN LANGUAGE UNDERSTANDING

被引:1
|
作者
Mueller, Markus [1 ]
Choudhary, Samridhi [1 ]
Chung, Clement [1 ]
Mouchtaris, Athanasios [1 ]
Kunzmann, Siegfried [1 ]
机构
[1] Amazon Alexa AI, Seattle, WA 98121 USA
关键词
spoken language understanding; multilingual; speech recognition; human-computer interaction;
D O I
10.1109/ASRU51503.2021.9688263
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
End-to-end spoken language understanding (E2E SLU) systems predict the utterance semantics directly from speech. So far, to the best of our knowledge, E2E models have only been trained to recognize the semantics for a single language. In this work we introduce the first multilingual E2E SLU system and present results across three languages - English, Spanish and French. We propose a transformer-based, multilingual acoustic encoder to predict intents, that leverages pre-training for both acoustic and linguistic modalities of the SLU model. It learns a robust, cross-modal latent space using a pre-trained multilingual BERT as a semantic teacher. The best performing model achieves relative improvements of 7.2% in a single language setting, 5-6% in two, and 4-6% in three language settings. An intent-wise analysis shows that semantic supervision becomes more important for shorter utterances, while providing an explicit language identifier at the input leads to lower intent classification errors.
引用
收藏
页码:1042 / 1049
页数:8
相关论文
共 50 条
  • [1] END-to-END Cross-Lingual Spoken Language Understanding Model with Multilingual Pretraining
    Zhang, Xianwei
    He, Liang
    [J]. INTERSPEECH 2021, 2021, : 4728 - 4732
  • [2] TOWARDS END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Serdyuk, Dmitriy
    Wang, Yongqiang
    Fuegen, Christian
    Kumar, Anuj
    Liu, Baiyang
    Bengio, Yoshua
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5754 - 5758
  • [3] Semantic Complexity in End-to-End Spoken Language Understanding
    McKenna, Joseph P.
    Choudhary, Samridhi
    Saxon, Michael
    Strimel, Grant P.
    Mouchtaris, Athanasios
    [J]. INTERSPEECH 2020, 2020, : 4273 - 4277
  • [4] A Streaming End-to-End Framework For Spoken Language Understanding
    Potdar, Nihal
    Avila, Anderson R.
    Xing, Chao
    Wang, Dong
    Cao, Yiran
    Chen, Xiao
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3906 - 3914
  • [5] WhiSLU: End-to-End Spoken Language Understanding with Whisper
    Wang, Minghan
    Li, Yinglu
    Guo, Jiaxin
    Qiao, Xiaosong
    Li, Zongyao
    Shang, Hengchao
    Wei, Daimeng
    Tao, Shimin
    Zhang, Min
    Yang, Hao
    [J]. INTERSPEECH 2023, 2023, : 770 - 774
  • [6] End-to-End Spoken Language Understanding for Generalized Voice Assistants
    Saxon, Michael
    Choudhary, Samridhi
    McKenna, Joseph P.
    Mouchtaris, Athanasios
    [J]. INTERSPEECH 2021, 2021, : 4738 - 4742
  • [7] End-to-End Spoken Language Understanding Without Full Transcripts
    Kuo, Hong-Kwang J.
    Tuske, Zoltan
    Thomas, Samuel
    Huang, Yinghui
    Audhkhasi, Kartik
    Kingsbury, Brian
    Kurata, Gakuto
    Kons, Zvi
    Hoory, Ron
    Lastras, Luis
    [J]. INTERSPEECH 2020, 2020, : 906 - 910
  • [8] Privacy-Preserving End-to-End Spoken Language Understanding
    Wang, Yinggui
    Huang, Wei
    Yang, Le
    [J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 5224 - 5232
  • [9] ERROR ANALYSIS APPLIED TO END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Caubriere, Antoine
    Ghannay, Sahar
    Tomashenko, Natalia
    De Mori, Renato
    Laurent, Antoine
    Morin, Emmanuel
    Esteve, Yannick
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8514 - 8518
  • [10] End-to-End Neural Transformer Based Spoken Language Understanding
    Radfar, Martin
    Mouchtaris, Athanasios
    Kunzmann, Siegfried
    [J]. INTERSPEECH 2020, 2020, : 866 - 870