LEARNING ASR-ROBUST CONTEXTUALIZED EMBEDDINGS FOR SPOKEN LANGUAGE UNDERSTANDING

被引:0
|
作者
Huang, Chao-Wei [1 ]
Chen, Yun-Nung [1 ]
机构
[1] Natl Taiwan Univ, Taipei, Taiwan
关键词
spoken language understanding; contextualized embedding; ASR robustness; RECURRENT NEURAL-NETWORKS;
D O I
10.1109/icassp40776.2020.9054689
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Employing pre-trained language models (LM) to extract contextualized word representations has achieved state-of-the-art performance on various NLP tasks. However, applying this technique to noisy transcripts generated by automatic speech recognizer (ASR) is concerned. Therefore, this paper focuses on making contextualized representations more ASR-robust. We propose a novel confusion-aware fine-tuning method to mitigate the impact of ASR errors on pre-trained LMs. Specifically, we fine-tune LMs to produce similar representations for acoustically confusable words that are obtained from word confusion networks (WCNs) produced by ASR. Experiments on multiple benchmark datasets show that the proposed method significantly improves the performance of spoken language understanding when performing on ASR transcripts(1).
引用
收藏
页码:8009 / 8013
页数:5
相关论文
共 50 条
  • [41] An Active Learning Approach for Statistical Spoken Language Understanding
    Garcia, Fernando
    Hurtado, Lluis-F.
    Sanchis, Emilio
    Segarra, Encarna
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, 2011, 7042 : 565 - 572
  • [42] UNDERSTANDING SPOKEN LANGUAGE
    BROWN, G
    TESOL QUARTERLY, 1978, 12 (03) : 271 - 283
  • [43] Spoken language understanding
    Wang, YY
    Deng, L
    Acero, A
    IEEE SIGNAL PROCESSING MAGAZINE, 2005, 22 (05) : 16 - 31
  • [44] RETRIEVING THE SYNTACTIC STRUCTURE OF ERRONEOUS ASR TRANSCRIPTIONS FOR OPEN-DOMAIN SPOKEN LANGUAGE UNDERSTANDING
    Bechet, Frederic
    Favre, Benoit
    Nasr, Alexis
    Morey, Mathieu
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [45] Beyond ASR 1-best:: Using word confusion networks in spoken language understanding
    Hakkani-Tur, Dilek
    Bechet, Frederic
    Riccardi, Giuseppe
    Tur, Gokhan
    COMPUTER SPEECH AND LANGUAGE, 2006, 20 (04): : 495 - 514
  • [46] GROUPWISE LEARNING FOR ASR K-BEST LIST RERANKING IN SPOKEN LANGUAGE TRANSLATION
    Ng, Raymond W. M.
    Shah, Kashif
    Specia, Lucia
    Hain, Thomas
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 6120 - 6124
  • [47] Japanese ASR-Robust Pre-trained Language Model with Pseudo-Error Sentences Generated by Grapheme-Phoneme Conversion
    Ohsugi, Yasuhito
    Saito, Itsumi
    Nishida, Kyosuke
    Yoshida, Sen
    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2022, 2022-September : 2688 - 2692
  • [48] WHAT IS BEST FOR SPOKEN LANGUAGE UNDERSTANDING: SMALL BUT TASK-DEPENDANT EMBEDDINGS OR HUGE BUT OUT-OF-DOMAIN EMBEDDINGS?
    Ghannay, Sahar
    Neuraz, Antoine
    Rosset, Sophie
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8114 - 8118
  • [49] Distributionally Robust Finetuning BERT for Covariate Drift in Spoken Language Understanding
    Broscheit, Samuel
    Quynh Do
    Gaspers, Judith
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 1970 - 1985
  • [50] An Effective and Robust Approach to Mandarin Spoken Language Understanding in Specific Domain
    He, Zhiyang
    Lv, Ping
    Wu, Ji
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 604 - +