A Reassigned Front-End for Speech Recognition

被引:0
|
作者
Tryfou, Georgina [1 ]
Omologo, Maurizio [1 ]
机构
[1] Fdn Bruno Kessler, Via Sommarive 18, Trento, Italy
关键词
TIME-FREQUENCY; REPRESENTATIONS; SCALE;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper introduces the use of the TFRCC features, a time-frequency reassigned feature set, as a front-end for speech recognition. Compared to the power spectrogram, the time-frequency reassigned version is particularly helpful in describing simultaneously the temporal and spectral features of speech signals, as it offers an improved visualization of the various components. This powerful attribute is exploited from the cepstral reassigned features, which are incorporated in a state-of-the-art speech recognizer. Experimental activities investigate the proposed features in various scenarios, starting from recognition of close-talk signals and gradually increasing the complexity of the task. The results prove the superiority of these features compared to a MFCC baseline.
引用
收藏
页码:553 / 557
页数:5
相关论文
共 50 条
  • [41] Data-Driven Design of Front-End Filter Bank for Lombard Speech Recognition
    Boril, Hynek
    Fousek, Petr
    Pollak, Petr
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 381 - 384
  • [42] JOINT TRAINING OF FRONT-END AND BACK-END DEEP NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Gao, Tian
    Du, Jun
    Dai, Li-Rong
    Lee, Chin-Hui
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4375 - 4379
  • [43] An Introduction to the Chinese Speech Recognition Front-End of the NICT/ATR Multi-Lingual Speech Translation System
    张劲松
    Takatoshi Jitsuhiro
    Hirofumi Yamamoto
    胡新辉
    Satoshi Nakamura
    [J]. Tsinghua Science and Technology, 2008, (04) : 545 - 552
  • [44] An Introduction to the Chinese Speech Recognition Front-End of the NICT/ATR Multi-Lingual Speech Translation System
    Knowledge Creating Communication Research Center, National Institute of Information and Communications Technology, 2-2-2 Keihanna Science City, Kyoto, 619-0288, Japan
    不详
    不详
    [J]. Tsinghua Sci. Tech, 2008, 4 (545-552):
  • [45] Learning the Speech Front-end With Raw Waveform CLDNNs
    Sainath, Tara N.
    Weiss, Ron J.
    Senior, Andrew
    Wilson, Kevin W.
    Vinyals, Oriol
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1 - 5
  • [46] A noise robust front-end for speech recognition using hough transform and cumulative distribution mapping
    Choi, Eric H. C.
    [J]. 18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, PROCEEDINGS, 2006, : 286 - +
  • [47] Performance improvement of a bitstream-based front-end for wireless speech recognition in adverse environments
    Kim, HK
    Cox, RV
    Rose, RC
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (08): : 591 - 604
  • [48] A noise robust front-end with low computational cost for embedded in-car speech recognition
    Ding, Pei
    He, Lei
    Yan, Xiang
    Zhao, Rui
    Hao, Jie
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1045 - +
  • [49] Robust front-end for speech recognition based on computational auditory scene analysis and speaker model
    Guan, Yong
    Li, Peng
    Liu, Wen-Ju
    Xu, Bo
    [J]. Zidonghua Xuebao/ Acta Automatica Sinica, 2009, 35 (04): : 410 - 416
  • [50] Front-end for Far-field Speech Recognition based on Frequency Domain Linear Prediction
    Ganapathy, Sriram
    Thomas, Samuel
    Hermansky, Hynek
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 984 - +