Continuous Speech Recognition based on Convolutional Neural Network

被引:2
|
作者
Zhang, Qing-qing [1 ]
Liu, Yong [1 ]
Pan, Jie-lin [1 ]
Yan, Yong-hong [1 ]
机构
[1] Chinese Acad Sci, Key Lab Speech Acoust & Content Understanding, Beijing 100190, Peoples R China
来源
SEVENTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2015) | 2015年 / 9631卷
关键词
Convolutional Neural Networks; Continuous speech recognition; Local convolution; Weight-sharing; sub-sampling;
D O I
10.1117/12.2197152
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Convolutional Neural Networks (CNNs), which showed success in achieving translation invariance for many image processing tasks, are investigated for continuous speech recognitions in the paper. Compared to Deep Neural Networks (DNNs), which have been proven to be successful in many speech recognition tasks nowadays, CNNs can reduce the NN model sizes significantly, and at the same time achieve even better recognition accuracies. Experiments on standard speech corpus TIMIT showed that CNNs outperformed DNNs in the term of the accuracy when CNNs had even smaller model size.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network
    Badshah, Abdul Malik
    Ahmad, Jamil
    Rahim, Nasir
    Baik, Sung Wook
    2017 INTERNATIONAL CONFERENCE ON PLATFORM TECHNOLOGY AND SERVICE (PLATCON), 2017, : 125 - 129
  • [42] Speech Emotion Recognition in Neurological Disorders Using Convolutional Neural Network
    Zisad, Sharif Noor
    Hossain, Mohammad Shahadat
    Andersson, Karl
    BRAIN INFORMATICS, BI 2020, 2020, 12241 : 287 - 296
  • [43] Visual Speech Recognition of Korean Words Using Convolutional Neural Network
    Lee, Sung-Won
    Yu, Je-Hun
    Park, Seung Min
    Sim, Kwee-Bo
    INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS, 2019, 19 (01) : 1 - 9
  • [44] Exploring Convolutional Neural Network Structures and Optimization Techniques for Speech Recognition
    Abdel-Hamid, Ossama
    Deng, Li
    Yu, Dong
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3365 - 3369
  • [45] Convolutional Time Delay Neural Network for Khmer Automatic Speech Recognition
    Srun, Nalin
    Leang, Sotheara
    Thu, Ye Kyaw
    Sam, Sethserey
    2022 17TH INTERNATIONAL JOINT SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE PROCESSING (ISAI-NLP 2022) / 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INTERNET OF THINGS (AIOT 2022), 2022,
  • [46] Speech Emotion Recognition through Hybrid Features and Convolutional Neural Network
    Alluhaidan, Ala Saleh
    Saidani, Oumaima
    Jahangir, Rashid
    Nauman, Muhammad Asif
    Neffati, Omnia Saidani
    APPLIED SCIENCES-BASEL, 2023, 13 (08):
  • [47] Convolutional Neural Network with Spectrogram and Perceptual Features for Speech Emotion Recognition
    Zhang, Linjuan
    Wang, Longbiao
    Dang, Jianwu
    Guo, Lili
    Guan, Haotian
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT IV, 2018, 11304 : 62 - 71
  • [48] Speech Enhancement based on Deep Convolutional Neural Network
    Nuthakki, Ramesh
    Masanta, Payel
    Yukta, T. N.
    PROCEEDINGS OF THE 2021 FIFTH INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC 2021), 2021, : 770 - 775
  • [49] Convolutional Neural Networks for Speech Recognition
    Abdel-Hamid, Ossama
    Mohamed, Abdel-Rahman
    Jiang, Hui
    Deng, Li
    Penn, Gerald
    Yu, Dong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) : 1533 - 1545
  • [50] Deep and shallow features fusion based on deep convolutional neural network for speech emotion recognition
    Sun L.
    Chen J.
    Xie K.
    Gu T.
    International Journal of Speech Technology, 2018, 21 (04) : 931 - 940