SPLAT: Speech-Language Joint Pre-Training for Spoken Language Understanding

被引:0
|
作者
Chung, Yu-An [1 ]
Zhu, Chenguang [2 ]
Zeng, Michael [2 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
[2] Microsoft Cognit Serv Grp, Redmond, WA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spoken language understanding (SLU) requires a model to analyze input acoustic signal to understand its linguistic content and make predictions. To boost the models' performance, various pre-training methods have been proposed to learn rich representations from large-scale unannotated speech and text. However, the inherent disparities between the two modalities necessitate a mutual analysis. In this paper, we propose a novel semi-supervised learning framework, SPLAT, to jointly pre-train the speech and language modules. Besides conducting a self-supervised masked language modeling task on the two individual modules using unpaired speech and text, SPLAT aligns representations from the two modules in a shared latent space using a small amount of paired speech and text. Thus, during fine-tuning, the speech module alone can produce representations carrying both acoustic information and contextual semantic knowledge of an input acoustic signal. Experimental results verify the effectiveness of our approach on various SLU tasks. For example, SPLAT improves the previous state-of-the-art performance on the Spoken SQuAD dataset by more than 10%.
引用
收藏
页码:1897 / 1907
页数:11
相关论文
共 50 条
  • [1] SPEECH-LANGUAGE PRE-TRAINING FOR END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Qian, Yao
    Bianv, Ximo
    Shi, Yu
    Kanda, Naoyuki
    Shen, Leo
    Xiao, Zhen
    Zeng, Michael
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7458 - 7462
  • [2] A Study into Pre-training Strategies for Spoken Language Understanding on Dysarthric Speech
    Wang, Pu
    BabaAli, Bagher
    Van Hamme, Hugo
    [J]. INTERSPEECH 2021, 2021, : 36 - 40
  • [3] Speech Model Pre-training for End-to-End Spoken Language Understanding
    Lugosch, Loren
    Ravanelli, Mirco
    Ignoto, Patrick
    Tomar, Vikrant Singh
    Bengio, Yoshua
    [J]. INTERSPEECH 2019, 2019, : 814 - 818
  • [4] Pre-training for Spoken Language Understanding with Joint Textual and Phonetic Representation Learning
    Chen, Qian
    Wang, Wen
    Zhang, Qinglin
    [J]. INTERSPEECH 2021, 2021, : 1244 - 1248
  • [5] PRE-TRAINING FOR QUERY REWRITING IN A SPOKEN LANGUAGE UNDERSTANDING SYSTEM
    Chen, Zheng
    Fan, Xing
    Ling, Yuan
    Mathias, Lambert
    Guo, Chenlci
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7969 - 7973
  • [6] JAKET: Joint Pre-training of Knowledge Graph and Language Understanding
    Yu, Donghan
    Zhu, Chenguang
    Yang, Yiming
    Zeng, Michael
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11630 - 11638
  • [7] A New Pre-training Method for Training Deep Learning Models with Application to Spoken Language Understanding
    Celikyilmaz, Asli
    Sarikaya, Ruhi
    Hakkani-Tur, Dilek
    Liu, Xiaohu
    Ramesh, Nikhil
    Tur, Gokhan
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3255 - 3259
  • [8] On joint training with interfaces for spoken language understanding
    Raju, Anirudh
    Rao, Milind
    Tiwari, Gautam
    Dheram, Pranav
    Anderson, Bryan
    Zhang, Zhe
    Lee, Chul
    Bui, Bach
    Rastrow, Ariya
    [J]. INTERSPEECH 2022, 2022, : 1253 - 1257
  • [9] Efficient learning for spoken language understanding tasks with word embedding based pre-training
    Luan, Yi
    Watanabe, Shinji
    Harsham, Bret
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1398 - 1402
  • [10] Unified Language Model Pre-training for Natural Language Understanding and Generation
    Dong, Li
    Yang, Nan
    Wang, Wenhui
    Wei, Furu
    Liu, Xiaodong
    Wang, Yu
    Gao, Jianfeng
    Zhou, Ming
    Hon, Hsiao-Wuen
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32