Convolutional support vector machines for speech recognition

被引:14
|
作者
Passricha, Vishal [1 ]
Aggarwal, Rajesh Kumar [1 ]
机构
[1] Natl Inst Technol, Comp Engn Dept, Kurukshetra, Haryana, India
关键词
ASR; CNN; SVM; Maximum margin; CSVM; NEURAL-NETWORKS; FEATURES; MODELS;
D O I
10.1007/s10772-018-09584-4
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Convolutional neural networks (CNNs) have demonstrated the state-of-the-art performances on automatic speech recognition. Softmax activation function for prediction and minimizing the cross-entropy loss is employed by most of the CNNs. This paper proposes a new deep architecture in which two heterogeneous classification techniques named as CNN and support vector machines (SVMs) are combined together. In this proposed model, features are learned using convolution layer and classified by SVMs. The last layer of CNN i.e. softmax layer is replaced by SVMs to efficiently deal with high dimensional features. This model should be interpreted as a special form of structured SVM and named as convolutional support vector machine (CSVM). Instead of training each component separately, the parameters of CNN and SVMs are jointly trained using frame level max-margin, sequence level max-margin, and state-level minimum Bayes risk criterion. The performance of CSVM is checked on TIMIT and Wall Street Journal datasets for phone recognition. By incorporating the features of both CNN and SVMs, CSVM improves the result by 13.33% and 2.31% over baseline CNN and segmental recurrent neural networks respectively.
引用
收藏
页码:601 / 609
页数:9
相关论文
共 50 条
  • [31] Speech Emotion Recognition Based on Fuzzy Least Squares Support Vector Machines
    Zhang, Shiqing
    [J]. 2008 7TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-23, 2008, : 1299 - 1302
  • [32] Support vector machines employing cross-correlation for emotional speech recognition
    Chandaka, Suryannarayana
    Chatterjee, Amitava
    Munshi, Sugata
    [J]. MEASUREMENT, 2009, 42 (04) : 611 - 618
  • [33] Implicit State-Tying for Support Vector Machines Based Speech Recognition
    Bolanos, Daniel
    Ward, Wayne
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 924 - 927
  • [34] Whispered Speech Recognition using Hidden Markov Models and Support Vector Machines
    Galic, Jovan
    Popovic, Branislav
    Pavlovic, Dragana Sumarac
    [J]. ACTA POLYTECHNICA HUNGARICA, 2018, 15 (05) : 11 - 29
  • [35] Lattice segmentation and support vector machines for large vocabulary continuous speech recognition
    Venkataramani, V
    Byrne, W
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 817 - 820
  • [36] Support vector machines for face recognition
    Guo, GD
    Li, SZ
    Chan, KL
    [J]. IMAGE AND VISION COMPUTING, 2001, 19 (9-10) : 631 - 638
  • [37] Iris recognition with support vector machines
    Roy, K
    Hattacharya, P
    [J]. ADVANCES IN BIOMETRICS, PROCEEDINGS, 2006, 3832 : 486 - 492
  • [38] Comparative Analysis of Convolutional Neural Networks and Support Vector Machines for Automatic Target Recognition
    Gorovyi, Ievgen M.
    Sharapov, Dmytro S.
    [J]. 2017 5TH IEEE MICROWAVES, RADAR AND REMOTE SENSING SYMPOSIUM (MRRS), 2017, : 63 - 66
  • [39] Acoustic model combination for recognition of speech in multiple languages using support vector machines
    Gangashetty, SV
    Sekhar, CC
    Yegnanarayana, B
    [J]. 2004 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2004, : 3065 - 3069
  • [40] Design and Evaluation of Speech based Emotion Recognition System using Support Vector Machines
    Harshini, D.
    Pranjali, B.
    Ranjitha, M.
    Rushali, J.
    Manikandan, J.
    [J]. 2019 IEEE 16TH INDIA COUNCIL INTERNATIONAL CONFERENCE (IEEE INDICON 2019), 2019,