A MULTI-STREAM ASR FRAMEWORK FOR BLSTM MODELING OF CONVERSATIONAL SPEECH

被引:0
|
作者
Woellmer, Martin [1 ]
Eyben, Florian [1 ]
Schuller, Bjoern [1 ]
Rigoll, Gerhard [1 ]
机构
[1] Tech Univ Munich, Inst Human Machine Commun, D-8000 Munich, Germany
关键词
Long Short-Term Memory; Context Modeling; Conversational Speech Recognition; Recurrent Neural Networks; BIDIRECTIONAL LSTM;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose a novel multi-stream framework for continuous conversational speech recognition which employs bidirectional Long Short-Term Memory (BLSTM) networks for phoneme prediction. The BLSTM architecture allows recurrent neural nets to model long-range context, which led to improved ASR performance when combined with conventional triphone modeling in a Tandem system. In this paper, we extend the principle of joint BLSTM and triphone modeling to a multi-stream system which uses MFCC features and BLSTM predictions as observations originating from two independent data streams. Using the COSINE database, we show that this technique prevails over a recently proposed single-stream Tandem system as well as over a conventional HMM recognizer.
引用
收藏
页码:4860 / 4863
页数:4
相关论文
共 50 条
  • [1] Multi-stream ASR: An Oracle Perspective
    Misra, Hemant
    Vepa, Jithendra
    Bourlard, Herve
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2530 - +
  • [2] Multi-stream Attention-based BLSTM with Feature Segmentation for Speech Emotion Recognition
    Chiba, Yuya
    Nose, Takashi
    Ito, Akinori
    [J]. INTERSPEECH 2020, 2020, : 3301 - 3305
  • [3] TOPIC DETECTION IN CONVERSATIONAL TELEPHONE SPEECH USING CNN WITH MULTI-STREAM INPUTS
    Sun, Jian
    Guo, Wu
    Chen, Zhi
    Song, Yan
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7285 - 7289
  • [4] Multi-stream ASR trained with heterogeneous reverberant environments
    Shire, ML
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 253 - 256
  • [5] Multi-stream asynchrony modeling for audio-visual speech recognition
    Lv, Guoyun
    Jiang, Dongmei
    Zhao, Rongchun
    Hou, Yunshu
    [J]. ISM 2007: NINTH IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, PROCEEDINGS, 2007, : 37 - 44
  • [6] Stream fusion for multi-stream automatic speech recognition
    Sagha, Hesam
    Li, Feipeng
    Variani, Ehsan
    Millan, Jose del R.
    Chavarriaga, Ricardo
    Schuller, Bjoern
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2016, 19 (04) : 669 - 675
  • [7] Hill-Climbing Feature Selection for Multi-Stream ASR
    Gelbart, David
    Morgan, Nelson
    Tsymbal, Alexey
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2919 - 2922
  • [8] Multi-stream adaptive evidence combination for noise robust ASR
    Morris, A
    Hagen, A
    Glotin, H
    Bourlard, H
    [J]. SPEECH COMMUNICATION, 2001, 34 (1-2) : 25 - 40
  • [9] Multi-stream parameterization for structural speech recognition
    Asakawa, Satoshi
    Minematsu, Nobuaki
    Hirose, Keikichi
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4097 - +
  • [10] DBN based multi-stream models for speech
    Zhang, YM
    Diao, Q
    Huang, S
    Hu, W
    Bartels, C
    Bilmes, J
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 836 - 839