SPEECH-LANGUAGE PRE-TRAINING FOR END-TO-END SPOKEN LANGUAGE UNDERSTANDING

被引:18
|
作者
Qian, Yao [1 ]
Bianv, Ximo [2 ]
Shi, Yu [1 ]
Kanda, Naoyuki [1 ]
Shen, Leo [1 ]
Xiao, Zhen [1 ]
Zeng, Michael [1 ]
机构
[1] Microsoft Cognit Serv Res Grp, Bellevue, WA 98004 USA
[2] Beijing Inst Technol, Beijing, Peoples R China
关键词
spoken language understanding; end-to-end approach; pre-training; transfer learning; self-supervised learning;
D O I
10.1109/ICASSP39728.2021.9414900
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
End-to-end (E2E) spoken language understanding (SLU) can infer semantics directly from speech signal without cascading an automatic speech recognizer (ASR) with a natural language understanding (NLU) module. However, paired utterance recordings and corresponding semantics may not always be available or sufficient to train an E2E SLU model in a real production environment. In this paper, we propose to unify a well-optimized E2E ASR encoder (speech) and a pre-trained language model encoder (language) into a transformer decoder. The unified speech-language pre-trained model (SLP) is continually enhanced on limited labeled data from a target domain by using a conditional masked language model (MLM) objective, and thus can effectively generate a sequence of intent, slot type, and slot value for given input speech in the inference. The experimental results on two public corpora show that our approach to E2E SLU is superior to the conventional cascaded method. It also outperforms the present state-of-the-art approaches to E2E SLU with much less paired data.
引用
收藏
页码:7458 / 7462
页数:5
相关论文
共 50 条
  • [21] IN PURSUIT OF BABEL - MULTILINGUAL END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Mueller, Markus
    Choudhary, Samridhi
    Chung, Clement
    Mouchtaris, Athanasios
    Kunzmann, Siegfried
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 1042 - 1049
  • [22] A DATA EFFICIENT END-TO-END SPOKEN LANGUAGE UNDERSTANDING ARCHITECTURE
    Dinarelli, Marco
    Kapoor, Nikita
    Jabaian, Bassam
    Besacier, Laurent
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8519 - 8523
  • [23] Integrating Dialog History into End-to-End Spoken Language Understanding Systems
    Ganhotra, Jatin
    Thomas, Samuel
    Kuo, Hong-Kwang J.
    Joshi, Sachindra
    Saon, George
    Tuske, Zoltan
    Kingsbury, Brian
    [J]. INTERSPEECH 2021, 2021, : 1254 - 1258
  • [24] End-to-End Spoken Language Understanding: Bootstrapping in Low Resource Scenarios
    Bhosale, Swapnil
    Sheikh, Imran
    Dumpala, Sri Harsha
    Kopparapu, Sunil Kumar
    [J]. INTERSPEECH 2019, 2019, : 1188 - 1192
  • [25] IMPROVING END-TO-END MODELS FOR SET PREDICTION IN SPOKEN LANGUAGE UNDERSTANDING
    Kuo, Hong-Kwang J.
    Tuske, Zoltan
    Thomas, Samuel
    Kingsbury, Brian
    Saon, George
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7162 - 7166
  • [26] END-TO-END ARCHITECTURES FOR ASR-FREE SPOKEN LANGUAGE UNDERSTANDING
    Palogiannidi, Elisavet
    Gkinis, Ioannis
    Mastrapas, George
    Mizera, Petr
    Stafylakis, Themos
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7974 - 7978
  • [27] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
    Huang, Zhicheng
    Zeng, Zhaoyang
    Huang, Yupan
    Liu, Bei
    Fu, Dongmei
    Fu, Jianlong
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12971 - 12980
  • [28] Toward Low-Cost End-to-End Spoken Language Understanding
    Dinarelli, Marco
    Naguib, Marco
    Portet, Francois
    [J]. INTERSPEECH 2022, 2022, : 2728 - 2732
  • [29] Low resource end-to-end spoken language understanding with capsule networks
    Poncelet, Jakob
    Renkens, Vincent
    Van hamme, Hugo
    [J]. COMPUTER SPEECH AND LANGUAGE, 2021, 66
  • [30] TOP-DOWN ATTENTION IN END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Chen, Yixin
    Lu, Weiyi
    Mottini, Alejandro
    Li, Li Erran
    Droppo, Jasha
    Du, Zheng
    Zeng, Belinda
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6199 - 6203