A Robust Pitch Extractor Based on DTW Lines and CASA with Application in Noisy Speech Recognition

被引：0

作者：

Morales-Cordovilla, Juan A. ^{[1
]}

Cabanas-Molero, Pablo

Peinado, Antonio M. ^{[1
]}

Sanchez, Victoria ^{[1
]}

机构：

[1] Univ Granada, Dept Teoria Senal Telemat & Comunicac, E-18071 Granada, Spain

来源：

ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES | 2012年 / 328卷

关键词：

pitch extractor; pitch line; CASA; DTW; noise; robust speech recognition;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes a robust pitch extractor with application in Automatic Speech Recognition and based on selecting pitch lines of a tonegram (a representation of the different pitch energies at each frame time). First, the tonegram and its maximum energy regions are extracted and a Dynamic Time Warping algorithm finds the most energetic trajectories or pitch lines from these regions. A second stage estimates the tonegram of the most energetic lines by applying Computational Auditory Scene Analysis rules which reject and group octave-related lines. The mean pitch of the speaker is estimated and the final pitch is estimated by rejecting lines which are outside from the mean pitch. The proposed pitch extractor is evaluated in a novel way - by means of the word accuracy of a Missing Data recognizer on Aurora-2 database.

引用

页码：197 / 206

页数：10

共 50 条

[41] Robust emotional speech recognition based on binaural model and emotional auditory mask in noisy environments
Bashirpour, Meysam
Geravanchizadeh, Masoud
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2018,
[42] Robust emotional speech recognition based on binaural model and emotional auditory mask in noisy environments
Meysam Bashirpour
Masoud Geravanchizadeh
EURASIP Journal on Audio, Speech, and Music Processing, 2018
[43] Combined Multi-channel NMF-based Robust Beamforming for Noisy Speech Recognition
Mimura, Masato
Bando, Yoshiaki
Shimada, Kazuki
Sakai, Shinsuke
Yoshii, Kazuyoshi
Kawahara, Tatsuya
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2451 - 2455
[44] MEAN NORMALIZATION OF POWER FUNCTION BASED CEPSTRAL COEFFICIENTS FOR ROBUST SPEECH RECOGNITION IN NOISY ENVIRONMENT
Baek, Soonho
Kang, Hong-Goo
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[45] ROBUST FRONT-END PROCESSING FOR SPEECH RECOGNITION IN NOISY CONDITIONS
Das, Biswajit
Panda, Ashish
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5235 - 5239
[46] Speech Recognition Based on Efficient DTW Algorithm and Its DSP Implementation
Jing XinXing
Shi Xu
2012 INTERNATIONAL WORKSHOP ON INFORMATION AND ELECTRONICS ENGINEERING, 2012, 29 : 832 - 836
[47] Auditory model for robust speech recognition in real world noisy environments
Kim, DS
Lee, SY
Kil, RM
Zhu, XL
ELECTRONICS LETTERS, 1997, 33 (01) : 12 - 13
[48] Blind source extraction for robust speech recognition in multisource noisy environments
Nesta, Francesco
Matassoni, Marco
COMPUTER SPEECH AND LANGUAGE, 2013, 27 (03): : 703 - 725
[49] Robust Front-End Processing For Emotion Recognition In Noisy Speech
Pandharipande, Meghna
Chakraborty, Rupayan
Panda, Ashish
Kopparapu, Sunil Kumar
2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 324 - 328
[50] ROBUST SPEECH RECOGNITION UNDER NOISY ENVIRONMENTS USING ASYMMETRIC TAPERS
Alam, Md Jahangir
Kenny, Patrick
O'Shaughnessy, Douglas
2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 1638 - 1642

← 1 2 3 4 5 →