Exploiting foreign resources for DNN-based ASR

被引:9
|
作者
Motlicek, Petr [1 ]
Imseng, David [1 ]
Potard, Blaise [1 ]
Garner, Philip N. [1 ]
Himawan, Ivan [1 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
关键词
Automatic speech recognition; Deep learning for speech; Acoustic model adaptation; Semi-supervised training; SPEECH; ALGORITHM; FEATURES;
D O I
10.1186/s13636-015-0058-5
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Manual transcription of audio databases for the development of automatic speech recognition (ASR) systems is a costly and time-consuming process. In the context of deriving acoustic models adapted to a specific application, or in low-resource scenarios, it is therefore essential to explore alternatives capable of improving speech recognition results. In this paper, we investigate the relevance of foreign data characteristics, in particular domain and language, when using this data as an auxiliary data source for training ASR acoustic models based on deep neural networks (DNNs). The acoustic models are evaluated on a challenging bilingual database within the scope of the MediaParl project. Experimental results suggest that in-language (but out-of-domain) data is more beneficial than in-domain (but out-of-language) data when employed in either supervised or semi-supervised training of DNNs. The best performing ASR system, an HMM/GMM acoustic model that exploits DNN as a discriminatively trained feature extractor outperforms the best performing HMM/DNN hybrid by about 5 % relative (in terms of WER). An accumulated relative gain with respect to the MFCC-HMM/GMM baseline is about 30 % WER.
引用
下载
收藏
页码:1 / 10
页数:10
相关论文
共 50 条
  • [1] Exploiting foreign resources for DNN-based ASR
    Petr Motlicek
    David Imseng
    Blaise Potard
    Philip N. Garner
    Ivan Himawan
    EURASIP Journal on Audio, Speech, and Music Processing, 2015
  • [2] Preliminary experiments on the robustness of biologically motivated features for DNN-based ASR
    de-la-Calle-Silos, F.
    Valverde-Albacete, Francisco J.
    Gallardo-Antolin, A.
    Pelaez-Moreno, C.
    2015 4TH INTERNATIONAL WORK CONFERENCE ON BIOINSPIRED INTELLIGENCE (IWOBI), 2015, : 169 - 175
  • [3] Delta-MelSpectra Features for Noise Robustness to DNN-based ASR systems
    Kumar, Kshitiz
    Liu, Chaojun
    Gong, Yifan
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2445 - 2448
  • [4] Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-based ASR System
    Arai, Kenichi
    Araki, Shoko
    Ogawa, Atsunori
    Kinoshita, Keisuke
    Nakatani, Tomohiro
    Irino, Toshio
    INTERSPEECH 2020, 2020, : 1156 - 1160
  • [5] Predicting Speech Intelligibility of Enhanced Speech Using Phone Accuracy of DNN-based ASR System
    Arai, Kenichi
    Araki, Shoko
    Ogawa, Atsunori
    Kinoshita, Keisuke
    Nakatani, Tomohiro
    Yamamoto, Katsuhiko
    Irino, Toshio
    INTERSPEECH 2019, 2019, : 4275 - 4279
  • [6] DNN-based interference mitigation beamformer
    Ramezanpour, Parham
    Mosavi, Mohammad Reza
    IET RADAR SONAR AND NAVIGATION, 2020, 14 (11): : 1788 - 1794
  • [7] EXPLOITING SPECTRO-TEMPORAL STRUCTURES USING NMF FOR DNN-BASED SUPERVISED SPEECH SEPARATION
    Nie, Shuai
    Liang, Shan
    Li, Hao
    Zhang, XueLiang
    Yang, ZhanLei
    Liu, WenJu
    Dong, LiKe
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 469 - 473
  • [8] DNN-based Residual Echo Suppression
    Lee, Chul Min
    Shine, Jong Won
    Kim, Nam Soo
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1775 - 1779
  • [9] DNN-Based Arabic Speech Synthesis
    Amrouche, Aissa
    Bentrcia, Youssouf
    Boubakeur, Khadidja Nesrine
    Abed, Ahcene
    2022 9TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND ELECTRONICS ENGINEERING (ICEEE 2022), 2022, : 378 - 382
  • [10] DNN-based speaker clustering for speaker diarisation
    Milner, Rosanna
    Hain, Thomas
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2185 - 2189