DEEP CONTEXTUALIZED ACOUSTIC REPRESENTATIONS FOR SEMI-SUPERVISED SPEECH RECOGNITION

被引:0
|
作者
Ling, Shaoshi [1 ]
Liu, Yuzong [1 ]
Salazar, Julian [1 ]
Kirchhoff, Katrin [1 ]
机构
[1] Amazon AWS AI, Seattle, WA 98109 USA
关键词
speech recognition; acoustic representation learning; semi-supervised learning; FRAMEWORK;
D O I
10.1109/icassp40776.2020.9053176
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose a novel approach to semi-supervised automatic speech recognition (ASR). We first exploit a large amount of unlabeled audio data via representation learning, where we reconstruct a temporal slice of filterbank features from past and future context frames. The resulting deep contextualized acoustic representations (DeCoAR) are then used to train a CTC-based end-to-end ASR system using a smaller amount of labeled audio data. In our experiments, we show that systems trained on DeCoAR consistently outperform ones trained on conventional filterbank features, giving 42% and 19% relative improvement over the baseline on WSJ eval92 and LibriSpeech test-clean, respectively. Our approach can drastically reduce the amount of labeled data required; unsupervised training on LibriSpeech then supervision with 100 hours of labeled data achieves performance on par with training on all 960 hours directly.
引用
收藏
页码:6429 / 6433
页数:5
相关论文
共 50 条
  • [41] SEMI-SUPERVISED DEEP LEARNING REPRESENTATIONS IN EARTH OBSERVATION BASED FOREST MANAGEMENT
    Antropov, Oleg
    Molinier, Matthieu
    Kuzu, Ridvan Salih
    Hughes, Lloyd
    Russwurm, Marc
    Tuia, Devis
    Dumitru, Corneliu Octavian
    Ge, Shaojia
    Saha, Sudipan
    Zhu, Xiao Xiang
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 650 - 653
  • [42] An active semi-supervised deep learning model for human activity recognition
    Bi, Haixia
    Perello-Nieto, Miquel
    Santos-Rodriguez, Raul
    Flach, Peter
    Craddock, Ian
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2022, 14 (10) : 13049 - 13065
  • [43] An active semi-supervised deep learning model for human activity recognition
    Haixia Bi
    Miquel Perello-Nieto
    Raul Santos-Rodriguez
    Peter Flach
    Ian Craddock
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 : 13049 - 13065
  • [44] Semi-supervised Learning of Deep Difference Features for Facial Expression Recognition
    Xu, Can
    Xu, Ruyi
    Chen, Jingying
    Liu, Leyuan
    PATTERN RECOGNITION AND COMPUTER VISION, PT III, 2018, 11258 : 245 - 254
  • [45] Deep Recurrent Semi-Supervised EEG Representation Learning for Emotion Recognition
    Zhang, Guangyi
    Teinad, Ali, I
    2021 9TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2021,
  • [46] SEMI-SUPERVISED END-TO-END SPEECH RECOGNITION USING TEXT-TO-SPEECH AND AUTOENCODERS
    Karita, Shigeki
    Watanabe, Shinji
    Iwata, Tomoharu
    Delcroix, Marc
    Ogawa, Atsunori
    Nakatani, Tomohiro
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6166 - 6170
  • [47] Semi-supervised learning for continuous emotional intensity controllable speech synthesis with disentangled representations
    Oh, Yoori
    Lee, Juheon
    Han, Yoseob
    Lee, Kyogu
    INTERSPEECH 2023, 2023, : 4818 - 4822
  • [48] Multi-Task Semi-Supervised Adversarial Autoencoding for Speech Emotion Recognition
    Latif, Siddique
    Rana, Rajib
    Khalifa, Sara
    Jurdak, Raja
    Epps, Julien
    Schuller, Bjoern W.
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (02) : 992 - 1004
  • [49] Semi-supervised Single-Channel Speech-Music Separation for Automatic Speech Recognition
    Demir, Cemil
    Cemgil, A. Taylan
    Saraclar, Murat
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 688 - +
  • [50] Graph Based Semi-supervised Learning Methods Applied to Speech Recognition Problem
    Hoang Trang
    Tran, Loc Hoang
    NATURE OF COMPUTATION AND COMMUNICATION, 2015, 144 : 264 - 273