Pre-training for Spoken Language Understanding with Joint Textual and Phonetic Representation Learning

被引:5
|
作者
Chen, Qian [1 ]
Wang, Wen [1 ]
Zhang, Qinglin [1 ]
机构
[1] Alibaba Grp, Speech Lab, Hangzhou, Zhejiang, Peoples R China
来源
关键词
spoken language understanding; pre-training; joint text and speech representation learning; NETWORKS; SPEECH; ASR;
D O I
10.21437/Interspeech.2021-234
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In the traditional cascading architecture for spoken language understanding (SLU), it has been observed that automatic speech recognition errors could be detrimental to the performance of natural language understanding. End-to-end (E2E) SLU models have been proposed to directly map speech input to desired semantic frame with a single model, hence mitigating ASR error propagation. Recently, pre-training technologies have been explored for these E2E models. In this paper, we propose a novel joint textual-phonetic pre-training approach for learning spoken language representations, aiming at exploring the full potentials of phonetic information to improve SLU robustness to ASR errors. We explore phoneme labels as high-level speech features, and design and compare pre-training tasks based on conditional masked language model objectives and inter-sentence relation objectives. We also investigate the efficacy of combining textual and phonetic information during fine-tuning. Experimental results on spoken language understanding benchmarks, Fluent Speech Commands and SNIPS, show that the proposed approach significantly outperforms strong baseline models and improves robustness of spoken language understanding to ASR errors.
引用
收藏
页码:1244 / 1248
页数:5
相关论文
共 50 条
  • [41] GeoBERT: Pre-Training Geospatial Representation Learning on Point-of-Interest
    Gao, Yunfan
    Xiong, Yun
    Wang, Siqi
    Wang, Haofen
    APPLIED SCIENCES-BASEL, 2022, 12 (24):
  • [42] Understanding tables with intermediate pre-training
    Eisenschlos, Julian Martin
    Krichene, Syrine
    Mueller, Thomas
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020,
  • [43] COUPLED REPRESENTATION LEARNING FOR DOMAINS, INTENTS AND SLOTS IN SPOKEN LANGUAGE UNDERSTANDING
    Lee, Jihwan
    Kim, Dongchan
    Sarikaya, Ruhi
    Kim, Young-Bum
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 714 - 719
  • [44] Canine: Pre-training an Efficient Tokenization-Free Encoder for Language Representation
    Clark, Jonathan H.
    Garrette, Dan
    Turc, Iulia
    Wieting, John
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 : 73 - 91
  • [45] Textual Supervision for Visually Grounded Spoken Language Understanding
    Higy, Bertrand
    Elliott, Desmond
    Chrupala, Grzegorz
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2698 - 2709
  • [46] PHONETIC WORD - A BASIS FOR UNDERSTANDING AND LEARNING THE SPOKEN FRENCH
    Defterdarevic-Muradbegovic, Almasa
    GOVOR, 2008, 25 (01) : 3 - 30
  • [47] A JOINT MULTI-TASK LEARNING FRAMEWORK FOR SPOKEN LANGUAGE UNDERSTANDING
    Li, Changliang
    Kong, Cunliang
    Zhao, Yan
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6054 - 6058
  • [48] Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision
    Wang, Tzu-Jui Julius
    Laaksonen, Jorma
    Langer, Tomas
    Arponen, Heikki
    Bishop, Tom E.
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 1073 - 1083
  • [49] MarkupLM: Pre-training of Text and Markup Language for Visually Rich Document Understanding
    Li, Junlong
    Xu, Yiheng
    Cui, Lei
    Wei, Furu
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 6078 - 6087
  • [50] Better Pre-Training by Reducing Representation Confusion
    Zhang, Haojie
    Liang, Mingfei
    Xie, Ruobing
    Sun, Zhenlong
    Zhang, Bo
    Lin, Leyu
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 2325 - 2336