Multi-Stream Spectro-Temporal Features for Robust Speech Recognition

被引:0
|
作者
Zhao, Sherry Y. [1 ]
Morgan, Nelson [1 ]
机构
[1] Int Comp Sci Inst, Berkeley, CA 94704 USA
关键词
spectro-temporal features; speech recognition;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A multi-stream approach to utilizing the inherently large number of spectro-temporal features for speech recognition is investigated in this study. Instead of reducing the feature-space dimension, this method divides the features into streams so that each represents a patch of information in the spectro-temporal response field. When used in combination with MFCCs for speech recognition under both clean and noisy conditions, multi-stream spectro-temporal features provide roughly a 30% relative improvement in word-error rate over using MFCCs alone. The result suggests that the multi-stream approach may be an effective way to handle and utilize spectro-temporal features for speech applications.
引用
收藏
页码:898 / 901
页数:4
相关论文
共 50 条
  • [1] Hierarchical spectro-temporal features for robust speech recognition
    Domont, Xavier
    Heckmann, Martin
    Joublin, Frank
    Goerick, Christian
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4417 - 4420
  • [2] Multi-Stream to Many-Stream: Using Spectro-Temporal Features for ASR
    Zhao, Sherry Y.
    Ravuri, Suman
    Morgan, Nelson
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2935 - 2938
  • [3] An Experimental Analysis on Integrating Multi-Stream Spectro-Temporal, Cepstral and Pitch Information for Mandarin Speech Recognition
    Wang, Yow-Bang
    Li, Shang-Wen
    Lee, Lin-shan
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (10): : 2006 - 2014
  • [4] Spectro-Temporal Modulations for Robust Speech Emotion Recognition
    Yeh, Lan-Ying
    Chi, Tai-Shih
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 789 - 792
  • [5] Informative Spectro-Temporal Bottleneck Features for Noise-Robust Speech Recognition
    Chang, Shuo-Yiin
    Morgan, Nelson
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 99 - 103
  • [6] Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition
    Geng, Mengzhe
    Liu, Shansong
    Yu, Jianwei
    Xie, Xurong
    Hu, Shoukang
    Ye, Zi
    Jin, Zengrui
    Liu, Xunying
    Meng, Helen
    [J]. INTERSPEECH 2021, 2021, : 4793 - 4797
  • [7] Spectro-Temporal Directional Derivative Features for Automatic Speech Recognition
    Gibson, James
    Van Segbroeck, Maarten
    Ortega, Antonio
    Georgiou, Panayiotis
    Narayanan, Shrikanth
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 872 - 875
  • [8] Robust emotion recognition by spectro-temporal modulation statistic features
    Tai-Shih Chi
    Lan-Ying Yeh
    Chin-Cheng Hsu
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2012, 3 : 47 - 60
  • [9] MULTI-STREAM SPECTRO-TEMPORAL AND CEPSTRAL FEATURES BASED ON DATA-DRIVEN HIERARCHICAL PHONEME CLUSTERS
    Li, Shang-wen
    Sun, Liang-che
    Lee, Lin-shan
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5196 - 5199
  • [10] Robust emotion recognition by spectro-temporal modulation statistic features
    Chi, Tai-Shih
    Yeh, Lan-Ying
    Hsu, Chin-Cheng
    [J]. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2012, 3 (01) : 47 - 60