Pre-training for Spoken Language Understanding with Joint Textual and Phonetic Representation Learning

被引:5
|
作者
Chen, Qian [1 ]
Wang, Wen [1 ]
Zhang, Qinglin [1 ]
机构
[1] Alibaba Grp, Speech Lab, Hangzhou, Zhejiang, Peoples R China
来源
关键词
spoken language understanding; pre-training; joint text and speech representation learning; NETWORKS; SPEECH; ASR;
D O I
10.21437/Interspeech.2021-234
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In the traditional cascading architecture for spoken language understanding (SLU), it has been observed that automatic speech recognition errors could be detrimental to the performance of natural language understanding. End-to-end (E2E) SLU models have been proposed to directly map speech input to desired semantic frame with a single model, hence mitigating ASR error propagation. Recently, pre-training technologies have been explored for these E2E models. In this paper, we propose a novel joint textual-phonetic pre-training approach for learning spoken language representations, aiming at exploring the full potentials of phonetic information to improve SLU robustness to ASR errors. We explore phoneme labels as high-level speech features, and design and compare pre-training tasks based on conditional masked language model objectives and inter-sentence relation objectives. We also investigate the efficacy of combining textual and phonetic information during fine-tuning. Experimental results on spoken language understanding benchmarks, Fluent Speech Commands and SNIPS, show that the proposed approach significantly outperforms strong baseline models and improves robustness of spoken language understanding to ASR errors.
引用
收藏
页码:1244 / 1248
页数:5
相关论文
共 50 条
  • [21] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    Devlin, Jacob
    Chang, Ming-Wei
    Lee, Kenton
    Toutanova, Kristina
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4171 - 4186
  • [22] ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding
    Sun, Yu
    Wang, Shuohuan
    Li, Yukun
    Feng, Shikun
    Tian, Hao
    Wu, Hua
    Wang, Haifeng
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8968 - 8975
  • [23] ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation
    Wang, Weihan
    Yang, Zhen
    Xu, Bin
    Li, Juanzi
    Sun, Yankui
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3135 - 3146
  • [24] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
    Jian, Yiren
    Gao, Chongyang
    Vosoughi, Soroush
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [25] Multilingual Molecular Representation Learning via Contrastive Pre-training
    Guo, Zhihui
    Sharma, Pramod
    Martinez, Andy
    Du, Liang
    Abraham, Robin
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3441 - 3453
  • [26] RePreM: Representation Pre-training with Masked Model for Reinforcement Learning
    Cai, Yuanying
    Zhang, Chuheng
    Shen, Wei
    Zhang, Xuyun
    Ruan, Wenjie
    Huang, Longbo
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 6879 - 6887
  • [27] TRANSFORMER BASED UNSUPERVISED PRE-TRAINING FOR ACOUSTIC REPRESENTATION LEARNING
    Zhang, Ruixiong
    Wu, Haiwei
    Li, Wubo
    Jiang, Dongwei
    Zou, Wei
    Li, Xiangang
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6933 - 6937
  • [28] Correcting Chinese Spelling Errors with Phonetic Pre-training
    Zhang, Ruiqing
    Pang, Chao
    Zhang, Chuanqiang
    Wang, Shuohuan
    He, Zhongjun
    Sun, Yu
    Wu, Hua
    Wang, Haifeng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2250 - 2261
  • [29] Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
    You, Haoxuan
    Zhou, Luowei
    Xiao, Bin
    Codella, Noel
    Cheng, Yu
    Xu, Ruochen
    Chang, Shih-Fu
    Yuan, Lu
    COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 69 - 87
  • [30] Joint Pre-training and Local Re-training: Transferable Representation Learning on Multi-source Knowledge Graphs
    Sun, Zequn
    Huang, Jiacheng
    Lin, Jinghao
    Xu, Xiaozhou
    Chen, Qijin
    Hu, Wei
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 2132 - 2144