A multi-stream approach to audiovisual automatic speech recognition

被引：1

作者：

Hasegawa-Johnson, Mark ^{[1
]}

机构：

[1] Univ Illinois, Urbana, IL 61801 USA

来源：

2007 IEEE NINTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING | 2007年

关键词：

D O I：

10.1109/MMSP.2007.4412884

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes a multi-stream approach to automatic audiovisual speech recognition, based in part on Hickok and Poeppel's dual-stream model of human speech processing. The dual-stream model proposes that semantic networks may be accessed by at least three parallel neural streams: at least two ventral streams that map directly from acoustics to words (with different time scales), and at least one dorsal stream that maps from acoustics to articulation. Our implementation represents each of these streams by a dynamic Bayesian network; disagreements between the three streams are resolved using a voting scheme. The proposed algorithm was tested using the CUAVE audiovisual speech corpus. Results indicate that the ventral stream model tends to make fewer mistakes in the labeling of vowels, while the dorsal stream model tends to make fewer mistakes in the labeling of consonants; the recognizer voting scheme takes advantage of these differences to reduce overall word error rate.

引用

页码：328 / 331

页数：4

共 50 条

[1] Stream fusion for multi-stream automatic speech recognition
Sagha, Hesam
Li, Feipeng
Variani, Ehsan
Millan, Jose del R.
Chavarriaga, Ricardo
Schuller, Bjoern
[J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2016, 19 (04) : 669 - 675
[2] Hard-testing the multi-stream approach. to automatic speech recognition
Pera, V
Martens, JP
[J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2003, 2807 : 315 - 320
[3] Multi-stream parameterization for structural speech recognition
Asakawa, Satoshi
Minematsu, Nobuaki
Hirose, Keikichi
[J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4097 - +
[4] SUBBAND HYBRID FEATURE FOR MULTI-STREAM SPEECH RECOGNITION
Li, Feipeng
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[5] Multi-Stream End-to-End Speech Recognition
Li, Ruizhi
Wang, Xiaofei
Mallidi, Sri Harish
Watanabe, Shinji
Hori, Takaaki
Hermansky, Hynek
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (646-655) : 646 - 655
[6] Multi-stream Recognition of Noisy Speech with Performance Monitoring
Variani, Ehsan
Li, Feipeng
Hermansky, Hynek
[J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2977 - 2980
[7] Automated speech recognition by multi-stream dynamic time warping
Mohamadi, T
Gharbi, AH
Mezaache, S
Harrag, A
[J]. CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING 2001, VOLS I AND II, CONFERENCE PROCEEDINGS, 2001, : 527 - 531
[8] Hierarchical multi-stream posterior based speech recognition system
Ketabdar, H
Bourlard, H
Bengio, S
[J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3869 : 294 - 306
[9] Multi-stream HMM for EMG-based speech recognition
Manabe, H
Zhang, Z
[J]. PROCEEDINGS OF THE 26TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-7, 2004, 26 : 4389 - 4392
[10] Multi-stream acoustic model adaptation for noisy speech recognition
Tamura, Satoshi
Hayamizu, Satoru
[J]. 2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,

← 1 2 3 4 5 →