A multi-stream approach to audiovisual automatic speech recognition

被引:1
|
作者
Hasegawa-Johnson, Mark [1 ]
机构
[1] Univ Illinois, Urbana, IL 61801 USA
关键词
D O I
10.1109/MMSP.2007.4412884
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a multi-stream approach to automatic audiovisual speech recognition, based in part on Hickok and Poeppel's dual-stream model of human speech processing. The dual-stream model proposes that semantic networks may be accessed by at least three parallel neural streams: at least two ventral streams that map directly from acoustics to words (with different time scales), and at least one dorsal stream that maps from acoustics to articulation. Our implementation represents each of these streams by a dynamic Bayesian network; disagreements between the three streams are resolved using a voting scheme. The proposed algorithm was tested using the CUAVE audiovisual speech corpus. Results indicate that the ventral stream model tends to make fewer mistakes in the labeling of vowels, while the dorsal stream model tends to make fewer mistakes in the labeling of consonants; the recognizer voting scheme takes advantage of these differences to reduce overall word error rate.
引用
收藏
页码:328 / 331
页数:4
相关论文
共 50 条
  • [1] Stream fusion for multi-stream automatic speech recognition
    Sagha, Hesam
    Li, Feipeng
    Variani, Ehsan
    Millan, Jose del R.
    Chavarriaga, Ricardo
    Schuller, Bjoern
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2016, 19 (04) : 669 - 675
  • [2] Hard-testing the multi-stream approach. to automatic speech recognition
    Pera, V
    Martens, JP
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2003, 2807 : 315 - 320
  • [3] Multi-stream parameterization for structural speech recognition
    Asakawa, Satoshi
    Minematsu, Nobuaki
    Hirose, Keikichi
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4097 - +
  • [4] SUBBAND HYBRID FEATURE FOR MULTI-STREAM SPEECH RECOGNITION
    Li, Feipeng
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [5] Multi-Stream End-to-End Speech Recognition
    Li, Ruizhi
    Wang, Xiaofei
    Mallidi, Sri Harish
    Watanabe, Shinji
    Hori, Takaaki
    Hermansky, Hynek
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (646-655) : 646 - 655
  • [6] Multi-stream Recognition of Noisy Speech with Performance Monitoring
    Variani, Ehsan
    Li, Feipeng
    Hermansky, Hynek
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2977 - 2980
  • [7] Automated speech recognition by multi-stream dynamic time warping
    Mohamadi, T
    Gharbi, AH
    Mezaache, S
    Harrag, A
    [J]. CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING 2001, VOLS I AND II, CONFERENCE PROCEEDINGS, 2001, : 527 - 531
  • [8] Hierarchical multi-stream posterior based speech recognition system
    Ketabdar, H
    Bourlard, H
    Bengio, S
    [J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3869 : 294 - 306
  • [9] Multi-stream HMM for EMG-based speech recognition
    Manabe, H
    Zhang, Z
    [J]. PROCEEDINGS OF THE 26TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-7, 2004, 26 : 4389 - 4392
  • [10] Multi-stream acoustic model adaptation for noisy speech recognition
    Tamura, Satoshi
    Hayamizu, Satoru
    [J]. 2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,