A Reassigned Front-End for Speech Recognition

被引：0

作者：

Tryfou, Georgina ^{[1
]}

Omologo, Maurizio ^{[1
]}

机构：

[1] Fdn Bruno Kessler, Via Sommarive 18, Trento, Italy

来源：

2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO) | 2017年

关键词：

TIME-FREQUENCY; REPRESENTATIONS; SCALE;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This paper introduces the use of the TFRCC features, a time-frequency reassigned feature set, as a front-end for speech recognition. Compared to the power spectrogram, the time-frequency reassigned version is particularly helpful in describing simultaneously the temporal and spectral features of speech signals, as it offers an improved visualization of the various components. This powerful attribute is exploited from the cepstral reassigned features, which are incorporated in a state-of-the-art speech recognizer. Experimental activities investigate the proposed features in various scenarios, starting from recognition of close-talk signals and gradually increasing the complexity of the task. The results prove the superiority of these features compared to a MFCC baseline.

引用

页码：553 / 557

页数：5

共 50 条

[41] Data-Driven Design of Front-End Filter Bank for Lombard Speech Recognition
Boril, Hynek
Fousek, Petr
Pollak, Petr
[J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 381 - 384
[42] JOINT TRAINING OF FRONT-END AND BACK-END DEEP NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION
Gao, Tian
Du, Jun
Dai, Li-Rong
Lee, Chin-Hui
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4375 - 4379
[43] An Introduction to the Chinese Speech Recognition Front-End of the NICT/ATR Multi-Lingual Speech Translation System
张劲松
Takatoshi Jitsuhiro
Hirofumi Yamamoto
胡新辉
Satoshi Nakamura
[J]. Tsinghua Science and Technology, 2008, (04) : 545 - 552
[44] An Introduction to the Chinese Speech Recognition Front-End of the NICT/ATR Multi-Lingual Speech Translation System
Knowledge Creating Communication Research Center, National Institute of Information and Communications Technology, 2-2-2 Keihanna Science City, Kyoto, 619-0288, Japan
不详
不详
[J]. Tsinghua Sci. Tech, 2008, 4 (545-552):
[45] Learning the Speech Front-end With Raw Waveform CLDNNs
Sainath, Tara N.
Weiss, Ron J.
Senior, Andrew
Wilson, Kevin W.
Vinyals, Oriol
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1 - 5
[46] A noise robust front-end for speech recognition using hough transform and cumulative distribution mapping
Choi, Eric H. C.
[J]. 18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, PROCEEDINGS, 2006, : 286 - +
[47] Performance improvement of a bitstream-based front-end for wireless speech recognition in adverse environments
Kim, HK
Cox, RV
Rose, RC
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (08): : 591 - 604
[48] A noise robust front-end with low computational cost for embedded in-car speech recognition
Ding, Pei
He, Lei
Yan, Xiang
Zhao, Rui
Hao, Jie
[J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1045 - +
[49] Robust front-end for speech recognition based on computational auditory scene analysis and speaker model
Guan, Yong
Li, Peng
Liu, Wen-Ju
Xu, Bo
[J]. Zidonghua Xuebao/ Acta Automatica Sinica, 2009, 35 (04): : 410 - 416
[50] Front-end for Far-field Speech Recognition based on Frequency Domain Linear Prediction
Ganapathy, Sriram
Thomas, Samuel
Hermansky, Hynek
[J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 984 - +

← 1 2 3 4 5 →