Tandem connectionist feature extraction for conversational speech recognition

被引:0
|
作者
Zhu, QF
Chen, B
Morgan, N
Stolcke, A
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] SRI Int, Menlo Pk, CA 94025 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-Layer Perceptrons (MLPs) can be used in automatic speech recognition in many ways. A particular application of this tool over the last few years has been the Tandem approach, as described in [7] and other more recent publications. Here we discuss the characteristics of the MLP-based features used for the Tandem approach, and conclude with a report on their application to conversational speech recognition. The paper shows that MLP transformations yield variables that have regular distributions, which can be further modified by using logarithm to make the distribution easier to model by a Gaussian-HMM. Two or more vectors of these features can easily be combined without increasing the feature dimension. We also report recognition results that show that MLP features can significantly improve recognition performance for the NIST 2001 Hub-5 evaluation set with models trained on the Switchboard Corpus, even for complex systems incorporating MMIE training and other enhancements.
引用
收藏
页码:223 / 231
页数:9
相关论文
共 50 条
  • [1] Tandem connectionist feature extraction for conventional HMM systems
    Hermansky, H
    Ellis, DPW
    Sharma, S
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1635 - 1638
  • [2] Trapping conversational speech: Extending trap/tandem approaches to conversational telephone speech recognition
    Morgan, N
    Chen, BY
    Zhu, QF
    Stolcke, A
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 537 - 540
  • [3] An evaluation of a nonlinear feature transformation for conversational speech recognition
    Omar, MK
    Kingsbury, B
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 785 - 788
  • [4] Speech recognition as feature extraction for speaker recognition
    Stolcke, A.
    Shriberg, E.
    Ferrer, L.
    Kajarekar, S.
    Sonmez, K.
    Tur, G.
    [J]. 2007 IEEE WORKSHOP ON SIGNAL PROCESSING APPLICATIONS FOR PUBLIC SECURITY AND FORENSICS, 2007, : 39 - +
  • [5] Optimizing feature extraction for speech recognition
    Lee, CH
    Hyun, DH
    Choi, ES
    Go, JW
    Lee, CY
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (01): : 80 - 87
  • [6] Feature extraction for robust speech recognition
    Dharanipragada, S
    [J]. 2002 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL II, PROCEEDINGS, 2002, : 855 - 858
  • [7] Visual speech feature extraction for improved speech recognition
    Zhang, X
    Mersereau, RM
    Clements, M
    Broun, CC
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 1993 - 1996
  • [8] Composite Feature Extraction for Speech Emotion Recognition
    Fu, Yangzhi
    Yuan, Xiaochen
    [J]. 2020 IEEE 23RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2020), 2020, : 72 - 77
  • [9] Geometrical feature extraction for robust speech recognition
    Li, Xiaokun
    Kwan, Chiman
    [J]. 2005 39TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1 AND 2, 2005, : 558 - 562
  • [10] Reduced Feature Extraction for Emotional Speech Recognition
    Palo, Hemanta Kumar
    Mohanty, Mihir Narayan
    [J]. 2015 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2015,