Multi-Stream Spectro-Temporal Features for Robust Speech Recognition

被引:0
|
作者
Zhao, Sherry Y. [1 ]
Morgan, Nelson [1 ]
机构
[1] Int Comp Sci Inst, Berkeley, CA 94704 USA
关键词
spectro-temporal features; speech recognition;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A multi-stream approach to utilizing the inherently large number of spectro-temporal features for speech recognition is investigated in this study. Instead of reducing the feature-space dimension, this method divides the features into streams so that each represents a patch of information in the spectro-temporal response field. When used in combination with MFCCs for speech recognition under both clean and noisy conditions, multi-stream spectro-temporal features provide roughly a 30% relative improvement in word-error rate over using MFCCs alone. The result suggests that the multi-stream approach may be an effective way to handle and utilize spectro-temporal features for speech applications.
引用
收藏
页码:898 / 901
页数:4
相关论文
共 50 条
  • [21] Phase AutoCorrelation (PAC) features in entropy based multi-stream for robust speech recognition
    Ikbal, S
    Misra, H
    Bourlard, H
    Hermansky, H
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 205 - 208
  • [22] Data-Driven and Feedback Based Spectro-Temporal Features for Speech Recognition
    Sivaram, G. S. V. S.
    Nemala, Sridhar Krishna
    Mesgarani, Nima
    Hermansky, Hynek
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2010, 17 (11) : 957 - 960
  • [23] Novel Gammatone Filterbank Based Spectro-Temporal Features for Robust Phoneme Recognition
    Nagpal, Ankit
    Patil, Hemant A.
    [J]. PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2017, 2017, 10597 : 342 - 350
  • [24] Learning spectro-temporal features with 3D CNNs for speech emotion recognition
    Kim, Jaebok
    Truong, Khiet P.
    Englebienne, Gwenn
    Evers, Vanessa
    [J]. 2017 SEVENTH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2017, : 383 - 388
  • [25] Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition
    Meyer, Bernd T.
    Kollmeier, Birger
    [J]. SPEECH COMMUNICATION, 2011, 53 (05) : 753 - 767
  • [26] A new multi-stream approach using acoustic and visual features for robust speech recognition system
    Radha, N.
    Shahina, A.
    Khan, A. Nayeemulla
    Velusami, Jansi Rani Sella
    [J]. MATERIALS TODAY-PROCEEDINGS, 2022, 62 : 4916 - 4924
  • [27] AUTOMATIC RECOGNITION OF SPEECH EMOTION USING LONG-TERM SPECTRO-TEMPORAL FEATURES
    Wu, Siqing
    Falk, Tiago H.
    Chan, Wai-Yip
    [J]. 2009 16TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, VOLS 1 AND 2, 2009, : 205 - 210
  • [28] Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition
    Geng, Mengzhe
    Xie, Xurong
    Ye, Zi
    Wang, Tianzi
    Li, Guinan
    Hu, Shujie
    Liu, Xunying
    Meng, Helen
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2597 - 2611
  • [29] Bioinspired sparse spectro-temporal representation of speech for robust classification
    Martinez, C.
    Goddard, J.
    Milone, D.
    Rufiner, H.
    [J]. COMPUTER SPEECH AND LANGUAGE, 2012, 26 (05): : 336 - 348
  • [30] Autoencoder based multi-stream combination for noise robust speech recognition
    Mallidi, Sri Harish
    Ogawa, Tetsuji
    Vesely, Karel
    Nidadavolu, Phani S.
    Hermansky, Hynek
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3551 - 3555