SUBMODULAR DATA SELECTION WITH ACOUSTIC AND PHONETIC FEATURES FOR AUTOMATIC SPEECH RECOGNITION

被引:0
|
作者
Ni, Chongjia [1 ]
Wang, Lei [1 ]
Liu, Haibo [2 ]
Leung, Cheung-Chi [1 ]
Lu, Li [2 ]
Ma, Bin [1 ]
机构
[1] ASTAR, Inst Infocomm Res I2R, Singapore, Singapore
[2] Tencent Inc, Beijing, Peoples R China
关键词
Active learning; data selection; automatic speech recognition; submodular optimization;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose to use acoustic feature based submodular function optimization to select a subset of untranscribed data for manual transcription, and retrain the initial acoustic model with the additional transcribed data. The acoustic features are obtained from an unsupervised Gaussian mixture model. We also integrate the acoustic features with the phonetic features, which are obtained from an initial ASR system, in the submodular function. Submodular function optimization has been theoretically shown its near-optimal guarantee. We performed the experiments on 1000 hours of Mandarin mobile phone speech, in which 300 hours of initial data was for the training of an initial acoustic model. The experimental results show that the acoustic feature based approach, which does not rely on an initial ASR system, performs as well as the phonetic feature based approach. Moreover, there is complementary effect between the acoustic feature based and the phonetic feature based data selection. The submodular function with the combined features provides a relative 4.8% character error rate (CER) reduction over the corresponding ASR system using random selection. We also include the desired feature distribution obtained from a development set in a generalized function, but the improvement is insignificant.
引用
收藏
页码:4629 / 4633
页数:5
相关论文
共 50 条
  • [1] Phonetic Features Enhancement for Bangla Automatic Speech Recognition
    Kabir, Sharif M. Rasel
    Hassan, Foyzul
    Ahamed, Foysal
    Mamun, Khondokar
    Huda, Mohammad Nurul
    Nusrat, Fariha
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION ENGINEERING (ICCIE), 2015, : 25 - 28
  • [2] MARKOV MODEL ACOUSTIC PHONETIC COMPONENT FOR AUTOMATIC SPEECH RECOGNITION
    TAPPERT, CC
    [J]. INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1977, 9 (03): : 363 - 373
  • [3] Automatic assessments of dysarthric speech: the usability of acoustic-phonetic features
    van Bemmel, Loes
    Pesenti, Chiara
    Wei, Xue
    Strik, Helmer
    [J]. INTERSPEECH 2023, 2023, : 141 - 145
  • [4] Latent Dirichlet Allocation Based Acoustic Data Selection for Automatic Speech Recognition
    Doulaty, Mortaza Morrie
    Hain, Thomas
    [J]. INTERSPEECH 2019, 2019, : 3228 - 3232
  • [5] AUTOMATIC RECOGNITION OF PHONETIC PATTERNS IN SPEECH
    DUDLEY, H
    BALASHEK, S
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1958, 30 (08): : 721 - 732
  • [6] AUTOMATIC RECOGNITION OF PHONETIC ELEMENTS IN SPEECH
    DAVIS, KH
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1953, 25 (04): : 832 - 832
  • [7] PHONETIC FEATURES AND ACOUSTIC INVARIANCE IN SPEECH
    BLUMSTEIN, SE
    STEVENS, KN
    [J]. COGNITION, 1981, 10 (1-3) : 25 - 32
  • [8] A probabilistic framework for landmark detection based on phonetic features for automatic speech recognition
    Juneja, Amit
    Espy-Wilson, Carol
    [J]. Journal of the Acoustical Society of America, 2008, 123 (02): : 1154 - 1168
  • [9] Incorporating finer acoustic phonetic features in lexicon for Hindi language speech recognition
    Patil, Atul
    More, Prashant
    Sasikumar, M.
    [J]. JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2019, 40 (08): : 1731 - 1739
  • [10] AUTOMATIC SELECTION OF SPEAKERS FOR IMPROVED ACOUSTIC MODELLING: RECOGNITION OF DISORDERED SPEECH WITH SPARSE DATA
    Christensen, H.
    Casanueva, I.
    Cunningham, S.
    Green, P.
    Hain, T.
    [J]. 2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 254 - 259