NOISE ROBUST SPEECH RECOGNITION ON AURORA4 BY HUMANS AND MACHINES

被引:0
|
作者
Qian, Yanmin [1 ,2 ]
Tan, Tian [1 ]
Hu, Hu [1 ]
Liu, Qi [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China
[2] Tencent, Tencent AI Lab, Bellevue, WA 98004 USA
关键词
robust speech recognition; very deep convolution residual network; cluster adaptive training; future-vector;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Although great progress has been made in automatic speech recognition (ASR), significant performance degradation still exists in noisy environments. Based on our previous introduced very deep CNNs, this paper further integrates residual learning to evaluate very deep convolutional residual network (VDCRN) in noisy conditions, which shows more powerful robustness. Then, cluster adaptive training (CAT) is developed on the VDCRN to reduce the mismatch between the training and testing in noisy scenarios. Moreover, the advanced future-vector assisted LSTM-RNN LM is proposed to achieve a further gain. All the proposed approaches are evaluated on Aurora4 and show a significant improvement for each technology. The final system achieves 3.09% WER on Aurora4, which is approaching humans' performance on this task. This is a new milestone for noise-robust ASR on this benchmark.
引用
收藏
页码:5604 / 5608
页数:5
相关论文
共 50 条
  • [1] SPEAKER AND NOISE FACTORISATION ON THE AURORA4 TASK
    Wang, Y. -Q.
    Gales, M. J. F.
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4584 - 4587
  • [2] Speech recognition by machines and humans
    Lippmann, RP
    [J]. SPEECH COMMUNICATION, 1997, 22 (01) : 1 - 15
  • [3] Speech recognition by machines and humans
    Lincoln Lab MIT, Lexington, United States
    [J]. Speech Commun, 1 (1-15):
  • [4] What's the difference? Comparing humans and machines on the Aurora 2 speech recognition task
    Meyer, Bernd T.
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2633 - 2637
  • [5] Structured Support Vector Machines for Noise Robust Continuous Speech Recognition
    Zhang, Shi-Xiong
    Gales, M. J. F.
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 996 - 999
  • [6] FACTORIAL HIDDEN RESTRICTED BOLTZMANN MACHINES FOR NOISE ROBUST SPEECH RECOGNITION
    Rennie, Steven J.
    Fousek, Petr
    Dognin, Pierre L.
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4297 - 4300
  • [7] Speech recognition by humans and machines under conditions with severe channel variability and noise
    Lippmann, RP
    Carlson, BA
    [J]. APPLICATIONS AND SCIENCE OF ARTIFICIAL NEURAL NETWORKS III, 1997, 3077 : 46 - 57
  • [8] English Conversational Telephone Speech Recognition by Humans and Machines
    Saon, George
    Kurata, Gakuto
    Sercu, Tom
    Audhkhasi, Kartik
    Thomas, Samuel
    Dimitriadis, Dimitrios
    Cui, Xiaodong
    Ramabhadran, Bhuvana
    Picheny, Michael
    Lim, Lynn-Li
    Roomi, Bergul
    Hall, Phil
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 132 - 136
  • [9] ENGLISH BROADCAST NEWS SPEECH RECOGNITION BY HUMANS AND MACHINES
    Thomas, Samuel
    Suzuki, Masayuki
    Huang, Yinghui
    Kurata, Gakuto
    Tuske, Zoltan
    Saon, George
    Kingsbury, Brian
    Picheny, Michael
    Dibert, Tom
    Kaiser-Schatzlein, Alice
    Samko, Bern
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6455 - 6459
  • [10] Robust speech recognition for car environment noise
    Kokubo, H
    Amano, A
    Hataoka, N
    [J]. ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 2002, 85 (11): : 65 - 73