The Joint Optimization of Spectro-Temporal Features and Neural Net Classifiers

被引:0
|
作者
Kovacs, Gyoergy [1 ]
Toth, Laszlo [2 ]
机构
[1] Univ Szeged, Dept Informat, Szeged, Hungary
[2] Hungarian Acad Sci, Res Grp Artificial Intelligence, Szeged, Hungary
来源
关键词
spectro-temporal features; Neural Net; phone recognition; TIMIT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In speech recognition, spectro-temporal feature extraction and the training of the acoustical model are usually performed separately. To improve recognition performance, we present a combined model which allows the training of the feature extraction filters along with a neural net classifier. Besides expecting that this joint training will result in a better recognition performance, we also expect that such a neural net can generate coefficients for spectro-temporal filters and also enhance preexisting ones, such as those obtained with the two-dimensional Discrete Cosine Transform (2D DCT) and Gabor filters. We tested these assumptions on the TIMIT phone recognition task. The results show that while the initialization based on the 2D DCT or Gabor coefficients is better in some cases than with simple random initialization, the joint model in practice always outperforms the standard two-step method. Furthermore, the results can be significantly improved by using a convolutional version of the network.
引用
收藏
页码:552 / 559
页数:8
相关论文
共 50 条
  • [1] Joint Optimization of Spectro-Temporal Features and Deep Neural Nets for Robust Automatic Speech Recognition
    Kovacs, Gyorgy
    Toth, Laszlo
    ACTA CYBERNETICA, 2015, 22 (01): : 117 - 134
  • [2] Joint regularization for spectro-temporal CT reconstruction
    Clark, D. P.
    Badea, C. T.
    MEDICAL IMAGING 2016: PHYSICS OF MEDICAL IMAGING, 2016, 9783
  • [3] Development of spectro-temporal features of speech in children
    Gautam S.
    Singh L.
    Gautam, Sumanlata (suman.gautam82@gmail.com), 1600, Springer Science and Business Media, LLC (20): : 543 - 551
  • [4] SPECTRO-TEMPORAL GABOR FEATURES FOR SPEAKER RECOGNITION
    Lei, Howard
    Meyer, Bernd T.
    Mirghafori, Nikki
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4241 - 4244
  • [5] Nonnegative features of spectro-temporal sounds for classification
    Cho, YC
    Choi, SJ
    PATTERN RECOGNITION LETTERS, 2005, 26 (09) : 1327 - 1336
  • [6] Spectro-Temporal Features for Howling Frequency Detection
    Lee, Jae-Won
    Choi, Seung Ho
    COMPUTER APPLICATIONS FOR WEB, HUMAN COMPUTER INTERACTION, SIGNAL AND IMAGE PROCESSING AND PATTERN RECOGNITION, 2012, 342 : 25 - +
  • [7] Spectro-temporal features for environmental sound classification
    Thwe, Khine Zar
    Thaw, Mie Mie
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2019, 20 (02) : 179 - 189
  • [8] SPECTRO-TEMPORAL NEURAL FACTORIZATION FOR SPEECH DEREVERBERATION
    Chien, Jen-Tzung
    Kuo, Kuan-Ting
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5449 - 5453
  • [9] Neural correlates of auditory spectro-temporal processing
    Gaab, N
    Talla, P
    Kim, H
    Archie, JJ
    Lakshminarayanan, K
    Glover, GH
    Gabrieli, JDE
    JOURNAL OF COGNITIVE NEUROSCIENCE, 2005, : 179 - 179
  • [10] A Closer Look on Hierarchical Spectro-Temporal Features (HIST)
    Heckmann, Martin
    Domont, Xavier
    Joublin, Frank
    Goerick, Christian
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 894 - 897