Continuous Tamil Speech Recognition technique under non stationary noisy environments

被引：4

作者：

Kalamani, M. ^{[1
]}

Krishnamoorthi, M. ^{[2
]}

Valarmathi, R. S. ^{[1
]}

机构：

[1] Bannari Amman Inst Technol, Dept Elect & Commun Engn, Sathyamangalam 638401, Tamil Nadu, India

[2] Bannari Amman Inst Technol, Dept Comp Sci & Engn, Sathyamangalam 638401, Tamil Nadu, India

来源：

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY | 2019年 / 22卷 / 01期

关键词：

Noise estimation; Speech enhancement; Speech segmentation; MFCC; FCM; EM-GMM; CTSR; Noisy environments; FILTERED-X LMS; FEEDBACK CANCELLATION; ALGORITHM; MODELS;

D O I：

10.1007/s10772-018-09580-8

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In the last few years, the need for Continuous Speech Recognition system in Tamil language has been increased widely. In this research work, efficient Continuous Tamil Speech Recognition (CTSR) technique is proposed under non stationary noisy environments. This research work consists of two stages such as speech enhancement and modelling phase. In this, the modified Modulation Magnitude Estimation based Spectral Subtraction with Chi-Square Distribution based Noise Estimation (SS-NE) algorithm is proposed to enhance the noisy Tamil speech signal under various non-stationary noise environments. In order to extract the speech segments from the continuous speech, further the enhanced speech signal is segmented through the combination of short-time signal energy and spectral centroid features of the signal. In this work, 26 mel frequency cepstral coefficients per frame are found as optimal values and they are considered as acoustic feature vectors for each frame. In this research work, the Fuzzy C-Means (FCM) clustering is used in order to cluster the extracted feature vectors into discrete symbols. From the evaluation results, it is found that the optimal number of clusters C' as 5. Finally, Tamil speech from various speakers is recognized using Expectation Maximization Gaussian Mixture Model (EM-GMM) with 16 component densities under continuous measurements of labelled features from FCM clustering techniques in order to reduce the word error rate. From the simulated results, it is observed that the proposed FCM with EM-GMM model for CTSR improves the recognition accuracy from 1.2 to 4.4% when compared to the existing algorithms under different noisy environments by reducing the WER from 1.6 to 5.47%.

引用

页码：47 / 58

页数：12

共 50 条

[1] Continuous Tamil Speech Recognition technique under non stationary noisy environments
M. Kalamani
M. Krishnamoorthi
R. S. Valarmathi
[J]. International Journal of Speech Technology, 2019, 22 : 47 - 58
[2] Continuous speech recognition under non-stationary musical environments based on speech state transition model
Fujimoto, M
Ariki, Y
[J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 297 - 300
[3] Swarm Intelligence Based Feature Clustering for Continuous Speech Recognition Under Noisy Environments
Kalamani, M.
Krishnamoorthi, M.
Harikumar, R.
Valarmathi, R. S.
[J]. COMPUTATIONAL VISION AND BIO-INSPIRED COMPUTING, 2020, 1108 : 1248 - 1255
[4] Speech enhancement strategy for speech recognition microcontroller under noisy environments
Chan, Kit Yan
Nordholm, Sven
Yiu, Ka Fai Cedric
Togneri, Roberto
[J]. NEUROCOMPUTING, 2013, 118 : 279 - 288
[5] Performance of a wavelet-based frontend under typical noisy environments for continuous speech recognition
Sujatha, J
Kumar, KRP
Ramakrishnan, KR
Balakrishnan, N
[J]. PROCEEDINGS OF THE 6TH JOINT CONFERENCE ON INFORMATION SCIENCES, 2002, : 196 - 199
[6] Speech recognition in non-stationary adverse environments
Wang, ZH
Kenny, P
[J]. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 265 - 268
[7] SPEECH RECOGNITION IN NOISY ENVIRONMENTS - A SURVEY
GONG, YF
[J]. SPEECH COMMUNICATION, 1995, 16 (03) : 261 - 291
[8] ROBUST SPEECH RECOGNITION UNDER NOISY ENVIRONMENTS USING ASYMMETRIC TAPERS
Alam, Md Jahangir
Kenny, Patrick
O'Shaughnessy, Douglas
[J]. 2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 1638 - 1642
[9] Speech enhancement applied to speech recognition in noisy environments
[J]. Xu, Y.F., 2001, Press of Tsinghua University (41):
[10] Continuous Kannada Noisy Speech Recognition
Pasha, Nadeem
Roopa, S.
[J]. 2018 INTERNATIONAL CONFERENCE ON RECENT INNOVATIONS IN ELECTRICAL, ELECTRONICS & COMMUNICATION ENGINEERING (ICRIEECE 2018), 2018, : 857 - 861

← 1 2 3 4 5 →