Audio-visual speech recognition using convolutive bottleneck networks for a person with severe hearing loss

被引：0

作者：

Takashima, Yuki ^{[1
]}

Kakihara, Yasuhiro ^{[1
]}

Aihara, Ryo ^{[1
]}

Takiguchi, Tetsuya ^{[1
]}

Ariki, Yasuo ^{[1
]}

Mitani, Nobuyuki ^{[2
]}

Omori, Kiyohiro ^{[2
]}

Nakazono, Kaoru ^{[2
]}

机构：

[1] Graduate School of System Informatics, Kobe University, Kobe, Hyogo,657-8501, Japan

[2] Hyogo Institute of Assistive Technology, Kobe, Hyogo,651-2134, Japan

来源：

IPSJ Transactions on Computer Vision and Applications | 2015年 / 7卷

关键词：

Assistive technology - Audio visual speech recognition - Conventional methods - Feature extraction methods - Lip reading - Multi-modal - Robust feature extractions - Speaker independent model;

D O I：

10.2197/ipsjtcva.7.64

中图分类号：

学科分类号：

摘要：

In this paper, we propose an audio-visual speech recognition system for a person with an articulation disorder resulting from severe hearing loss. In the case of a person with this type of articulation disorder, the speech style is quite different from with the result that of people without hearing loss that a speaker-independent model for unimpaired persons is hardly useful for recognizing it. We investigate in this paper an audio-visual speech recognition system for a person with severe hearing loss in noisy environments, where a robust feature extraction method using a convolutive bottleneck network (CBN) is applied to audio-visual data. We confirmed the effectiveness of this approach through word-recognition experiments in noisy environments, where the CBN-based feature extraction method outperformed the conventional methods. © 2015 Information Processing Society of Japan.

引用

页码：64 / 68

共 50 条

[31] An asynchronous DBN for audio-visual speech recognition
Saenko, Kate
Livescu, Karen
2006 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, 2006, : 154 - +
[32] Audio-visual modeling for bimodal speech recognition
Kaynak, MN
Zhi, Q
Cheok, AD
Sengupta, K
Chung, KC
2001 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: E-SYSTEMS AND E-MAN FOR CYBERNETICS IN CYBERSPACE, 2002, : 181 - 186
[33] Bimodal fusion in audio-visual speech recognition
Zhang, XZ
Mersereau, RM
Clements, M
2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL I, PROCEEDINGS, 2002, : 964 - 967
[34] Guide to Audio-Visual Materials on Speech and Hearing Disorders
不详
VOLTA REVIEW, 1953, 55 (02) : 102 - 102
[35] Audio-Visual Speech Enhancement using Deep Neural Networks
Hou, Jen-Cheng
Wang, Syu-Siang
Lai, Ying-Hui
Lin, Jen-Chun
Tsao, Yu
Chang, Hsiu-Wen
Wang, Hsin-Min
2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
[36] Recognition of Isolated Digit Using Random Forest for Audio-Visual Speech Recognition
Prashant Borde
Sadanand Kulkarni
Bharti Gawali
Pravin Yannawar
Proceedings of the National Academy of Sciences, India Section A: Physical Sciences, 2022, 92 : 103 - 110
[37] Recognition of Isolated Digit Using Random Forest for Audio-Visual Speech Recognition
Borde, Prashant
Kulkarni, Sadanand
Gawali, Bharti
Yannawar, Pravin
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES INDIA SECTION A-PHYSICAL SCIENCES, 2022, 92 (01) : 103 - 110
[38] Using Twin-HMM-Based Audio-Visual Speech Enhancement as a Front-End for Robust Audio-Visual Speech Recognition
Abdelaziz, Ahmed Hussen
Zeiler, Steffen
Kolossa, Dorothea
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 867 - 871
[39] Intermodal timing relations and audio-visual speech recognition by normal-hearing adults
McGrath, M.
Summerfield, Q.
Journal of the Acoustical Society of America, 1985, 77 (02): : 678 - 685
[40] Effects of hearing loss and audio-visual cues on children?s speech processing speed
Holt, Rebecca
Bruggeman, Laurence
Demuth, Katherine
SPEECH COMMUNICATION, 2023, 146 : 11 - 21

← 1 2 3 4 5 →