Pre-training for Spoken Language Understanding with Joint Textual and Phonetic Representation Learning

被引:5
|
作者
Chen, Qian [1 ]
Wang, Wen [1 ]
Zhang, Qinglin [1 ]
机构
[1] Alibaba Grp, Speech Lab, Hangzhou, Zhejiang, Peoples R China
来源
关键词
spoken language understanding; pre-training; joint text and speech representation learning; NETWORKS; SPEECH; ASR;
D O I
10.21437/Interspeech.2021-234
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In the traditional cascading architecture for spoken language understanding (SLU), it has been observed that automatic speech recognition errors could be detrimental to the performance of natural language understanding. End-to-end (E2E) SLU models have been proposed to directly map speech input to desired semantic frame with a single model, hence mitigating ASR error propagation. Recently, pre-training technologies have been explored for these E2E models. In this paper, we propose a novel joint textual-phonetic pre-training approach for learning spoken language representations, aiming at exploring the full potentials of phonetic information to improve SLU robustness to ASR errors. We explore phoneme labels as high-level speech features, and design and compare pre-training tasks based on conditional masked language model objectives and inter-sentence relation objectives. We also investigate the efficacy of combining textual and phonetic information during fine-tuning. Experimental results on spoken language understanding benchmarks, Fluent Speech Commands and SNIPS, show that the proposed approach significantly outperforms strong baseline models and improves robustness of spoken language understanding to ASR errors.
引用
收藏
页码:1244 / 1248
页数:5
相关论文
共 50 条
  • [31] Vision-Language Pre-Training with Triple Contrastive Learning
    Yang, Jinyu
    Duan, Jiali
    Tran, Son
    Xu, Yi
    Chanda, Sampath
    Chen, Liqun
    Zeng, Belinda
    Chilimbi, Trishul
    Huang, Junzhou
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15650 - 15659
  • [32] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
    Huang, Zhicheng
    Zeng, Zhaoyang
    Huang, Yupan
    Liu, Bei
    Fu, Dongmei
    Fu, Jianlong
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12971 - 12980
  • [33] Learning Better Masking for Better Language Model Pre-training
    Yang, Dongjie
    Zhang, Zhuosheng
    Zhao, Hai
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 7255 - 7267
  • [34] Understanding Chinese Video and Language via Contrastive Multimodal Pre-Training
    Lei, Chenyi
    Luo, Shixian
    Liu, Yong
    He, Wanggui
    Wang, Jiamang
    Wang, Guoxin
    Tang, Haihong
    Miao, Chunyan
    Li, Houqiang
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2567 - 2576
  • [35] Multimodal Pre-training Method for Vision-language Understanding and Generation
    Liu T.-Y.
    Wu Z.-X.
    Chen J.-J.
    Jiang Y.-G.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (05): : 2024 - 2034
  • [36] Improving Knowledge Graph Representation Learning by Structure Contextual Pre-training
    Ye, Ganqiang
    Zhang, Wen
    Bi, Zhen
    Wong, Chi Man
    Chen, Hui
    Chen, Huajun
    PROCEEDINGS OF THE 10TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE GRAPHS (IJCKG 2021), 2021, : 151 - 155
  • [37] VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning
    Chen, Qibin
    Lacomis, Jeremy
    Schwartz, Edward J.
    Neubig, Graham
    Vasilescu, Bogdan
    Le Goues, Claire
    2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 2327 - 2339
  • [38] Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding
    Zhang, Taolin
    He, Sunan
    Dai, Tao
    Wang, Zhi
    Chen, Bin
    Xia, Shu-Tao
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7296 - 7304
  • [39] Hierarchical Pre-training for Sequence Labelling in Spoken Dialog
    Chapuis, Emile
    Colombo, Pierre
    Manica, Matteo
    Labeau, Matthieu
    Clavel, Chloe
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2636 - 2648
  • [40] Always be Pre-Training: Representation Learning for Network Intrusion Detection with GNNs
    Gu, Zhengyao
    Lopez, Diego Troy
    Alrahis, Lilas
    Sinanoglu, Ozgur
    2024 25TH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN, ISQED 2024, 2024,