Convolutional support vector machines for speech recognition

被引：14

作者：

Passricha, Vishal ^{[1
]}

Aggarwal, Rajesh Kumar ^{[1
]}

机构：

[1] Natl Inst Technol, Comp Engn Dept, Kurukshetra, Haryana, India

来源：

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY | 2019年 / 22卷 / 03期

关键词：

ASR; CNN; SVM; Maximum margin; CSVM; NEURAL-NETWORKS; FEATURES; MODELS;

D O I：

10.1007/s10772-018-09584-4

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Convolutional neural networks (CNNs) have demonstrated the state-of-the-art performances on automatic speech recognition. Softmax activation function for prediction and minimizing the cross-entropy loss is employed by most of the CNNs. This paper proposes a new deep architecture in which two heterogeneous classification techniques named as CNN and support vector machines (SVMs) are combined together. In this proposed model, features are learned using convolution layer and classified by SVMs. The last layer of CNN i.e. softmax layer is replaced by SVMs to efficiently deal with high dimensional features. This model should be interpreted as a special form of structured SVM and named as convolutional support vector machine (CSVM). Instead of training each component separately, the parameters of CNN and SVMs are jointly trained using frame level max-margin, sequence level max-margin, and state-level minimum Bayes risk criterion. The performance of CSVM is checked on TIMIT and Wall Street Journal datasets for phone recognition. By incorporating the features of both CNN and SVMs, CSVM improves the result by 13.33% and 2.31% over baseline CNN and segmental recurrent neural networks respectively.

引用

页码：601 / 609

页数：9

共 50 条

[21] Speaker Recognition from Coded Speech Using Support Vector Machines
Janicki, Artur
Staroszczyk, Tomasz
[J]. TEXT, SPEECH AND DIALOGUE, TSD 2011, 2011, 6836 : 291 - 298
[22] VISUAL SPEECH RECOGNITION USING OPTICAL FLOW AND SUPPORT VECTOR MACHINES
Shaikh, Ayaz A.
Kumar, Dinesh K.
Gubbi, Jayavardhana
[J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2011, 10 (02) : 167 - 187
[23] Structured Support Vector Machines for Noise Robust Continuous Speech Recognition
Zhang, Shi-Xiong
Gales, M. J. F.
[J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 996 - 999
[24] VISUAL SPEECH RECOGNITION USING DYNAMIC FEATURES AND SUPPORT VECTOR MACHINES
Yau, Wai Chee
Kumar, Dinesh Kant
Arjunan, Sridhar Poosapadi
[J]. INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2008, 8 (03) : 419 - 437
[25] Tone recognition of continuous Cantonese speech based on support vector machines
Peng, G
Wang, WSY
[J]. SPEECH COMMUNICATION, 2005, 45 (01) : 49 - 62
[26] A Support Vector Machines-based rejection technique for speech recognition
Ma, CX
Randolph, MA
Drish, J
[J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 381 - 384
[27] Speech Emotion Recognition using Convolutional Long Short-Term Memory Neural Network and Support Vector Machines
Kurpukdee, Nattapong
Koriyama, Tomoki
Kobayashi, Takao
Kasuriya, Sawit
Wutiwiwatchai, Chai
Lamsrichan, Poonlap
[J]. 2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1744 - 1749
[28] Cattle Brand Recognition using Convolutional Neural Network and Support Vector Machines
Silva, C.
Welfer, D.
Gioda, F. P.
Dornelles, C.
[J]. IEEE LATIN AMERICA TRANSACTIONS, 2017, 15 (02) : 310 - 316
[29] Speech Recognition using Wavelet Packets, Neural Networks and Support Vector Machines
Kulkarni, Purva
Kulkarni, Saili
Mulange, Sucheta
Dand, Aneri
Cheeran, Alice N.
[J]. 2014 INTERNATIONAL CONFERENCE ON SIGNAL PROPAGATION AND COMPUTER TECHNOLOGY (ICSPCT 2014), 2014, : 451 - 455
[30] Robust Noisy Speech Recognition Using Deep Neural Support Vector Machines
Amami, Rimah
Ben Ayed, Dorra
[J]. DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 2019, 800 : 300 - 307

← 1 2 3 4 5 →