Time-Frequency Kernel-Based CNN for Speech Recognition

被引:0
|
作者
Zhao, Tuo [1 ]
Zhao, Yunxin [1 ]
Chen, Xin [2 ]
机构
[1] Univ Missouri, Dept Comp Sci, Columbia, MO 65211 USA
[2] Pearson Knowledge Technol, Menlo Pk, CA 94025 USA
关键词
time-frequency kernels; convolutional neural network; robust speech recognition;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose a novel approach to generate time-frequency kernel based deep convolutional neural networks (CNN) for robust speech recognition. We give different treatments to shifting along the time and the frequency axes of speech feature representations in the 2D convolution, so as to achieve certain invariance in small frequency shifts while expanding time context size for speech input without smearing time positions of phone segments. The 20-kernel approach allows easy implementation of deep CNNs. We present experimental results on speaker-independent phone recognition tasks of TIMIT and FFMTIMIT. where the latter was acquired using a far-field microphone and the speech data are noisy. Our results demonstrate that the proposed time-frequency kernel-based CNN gives consistent phone error reductions over frequency-domain CNN and DNN for both TIMIT and FFMTIMIT, with more benefits shown for recognizing noisy speech by using clean speech models.
引用
收藏
页码:1888 / 1892
页数:5
相关论文
共 50 条
  • [1] TIME-FREQUENCY LOSS FOR CNN BASED SPEECH SUPER-RESOLUTION
    Wang, Heming
    Wang, Deliang
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 861 - 865
  • [2] Time-frequency representation based cepstral processing for speech recognition
    Fineberg, AB
    Yu, KC
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 25 - 28
  • [3] Isolate Speech Recognition Based on Time-Frequency Analysis Methods
    Mantilla-Caeiros, Alfredo
    Nakano Miyatake, Mariko
    Perez-Meana, Hector
    [J]. PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, PROCEEDINGS, 2009, 5856 : 297 - +
  • [4] Time-frequency distributions for automatic speech recognition
    Potamianos, A
    Maragos, P
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (03): : 196 - 200
  • [5] Speech recognition with localized time-frequency pattern detectors
    Schutte, Ken
    Glass, James
    [J]. 2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 341 - 346
  • [6] TIME-FREQUENCY CONVOLUTIONAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Mitra, Vikramjit
    Franco, Horacio
    [J]. 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 317 - 323
  • [7] Time-Frequency Masking For Large Scale Robust Speech Recognition
    Wang, Yuxuan
    Misra, Ananya
    Chine, Kean K.
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2469 - 2473
  • [8] Binary and ratio time-frequency masks for robust speech recognition
    Srinivasan, Soundararajan
    Roman, Nicoleta
    Wang, DeLiang
    [J]. SPEECH COMMUNICATION, 2006, 48 (11) : 1486 - 1501
  • [9] Cepstral representation of speech motivated by time-frequency masking: An application to speech recognition
    Aikawa, K
    Singer, H
    Kawahara, H
    Tohkura, Y
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1996, 100 (01): : 603 - 614
  • [10] Time-frequency analysis and auditory modeling for automatic recognition of speech
    Pitton, JW
    Wang, KS
    Juang, BH
    [J]. PROCEEDINGS OF THE IEEE, 1996, 84 (09) : 1199 - 1215