Pre-training for Spoken Language Understanding with Joint Textual and Phonetic Representation Learning

被引:5
|
作者
Chen, Qian [1 ]
Wang, Wen [1 ]
Zhang, Qinglin [1 ]
机构
[1] Alibaba Grp, Speech Lab, Hangzhou, Zhejiang, Peoples R China
来源
关键词
spoken language understanding; pre-training; joint text and speech representation learning; NETWORKS; SPEECH; ASR;
D O I
10.21437/Interspeech.2021-234
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In the traditional cascading architecture for spoken language understanding (SLU), it has been observed that automatic speech recognition errors could be detrimental to the performance of natural language understanding. End-to-end (E2E) SLU models have been proposed to directly map speech input to desired semantic frame with a single model, hence mitigating ASR error propagation. Recently, pre-training technologies have been explored for these E2E models. In this paper, we propose a novel joint textual-phonetic pre-training approach for learning spoken language representations, aiming at exploring the full potentials of phonetic information to improve SLU robustness to ASR errors. We explore phoneme labels as high-level speech features, and design and compare pre-training tasks based on conditional masked language model objectives and inter-sentence relation objectives. We also investigate the efficacy of combining textual and phonetic information during fine-tuning. Experimental results on spoken language understanding benchmarks, Fluent Speech Commands and SNIPS, show that the proposed approach significantly outperforms strong baseline models and improves robustness of spoken language understanding to ASR errors.
引用
收藏
页码:1244 / 1248
页数:5
相关论文
共 50 条
  • [1] SPLAT: Speech-Language Joint Pre-Training for Spoken Language Understanding
    Chung, Yu-An
    Zhu, Chenguang
    Zeng, Michael
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 1897 - 1907
  • [2] A New Pre-training Method for Training Deep Learning Models with Application to Spoken Language Understanding
    Celikyilmaz, Asli
    Sarikaya, Ruhi
    Hakkani-Tur, Dilek
    Liu, Xiaohu
    Ramesh, Nikhil
    Tur, Gokhan
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3255 - 3259
  • [3] PRE-TRAINING FOR QUERY REWRITING IN A SPOKEN LANGUAGE UNDERSTANDING SYSTEM
    Chen, Zheng
    Fan, Xing
    Ling, Yuan
    Mathias, Lambert
    Guo, Chenlci
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7969 - 7973
  • [4] Efficient learning for spoken language understanding tasks with word embedding based pre-training
    Luan, Yi
    Watanabe, Shinji
    Harsham, Bret
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1398 - 1402
  • [5] A Study into Pre-training Strategies for Spoken Language Understanding on Dysarthric Speech
    Wang, Pu
    BabaAli, Bagher
    Van Hamme, Hugo
    INTERSPEECH 2021, 2021, : 36 - 40
  • [6] SPEECH-LANGUAGE PRE-TRAINING FOR END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Qian, Yao
    Bianv, Ximo
    Shi, Yu
    Kanda, Naoyuki
    Shen, Leo
    Xiao, Zhen
    Zeng, Michael
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7458 - 7462
  • [7] JAKET: Joint Pre-training of Knowledge Graph and Language Understanding
    Yu, Donghan
    Zhu, Chenguang
    Yang, Yiming
    Zeng, Michael
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11630 - 11638
  • [8] Speech Model Pre-training for End-to-End Spoken Language Understanding
    Lugosch, Loren
    Ravanelli, Mirco
    Ignoto, Patrick
    Tomar, Vikrant Singh
    Bengio, Yoshua
    INTERSPEECH 2019, 2019, : 814 - 818
  • [9] Pre-training Universal Language Representation
    Li, Yian
    Zhao, Hai
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 5122 - 5133
  • [10] PreQR: Pre-training Representation for SQL Understanding
    Tang, Xiu
    Wu, Sai
    Song, Mingli
    Ying, Shanshan
    Li, Feifei
    Chen, Gang
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 204 - 216