Audio-visual speech recognition using convolutive bottleneck networks for a person with severe hearing loss

被引:0
|
作者
Takashima, Yuki [1 ]
Kakihara, Yasuhiro [1 ]
Aihara, Ryo [1 ]
Takiguchi, Tetsuya [1 ]
Ariki, Yasuo [1 ]
Mitani, Nobuyuki [2 ]
Omori, Kiyohiro [2 ]
Nakazono, Kaoru [2 ]
机构
[1] Graduate School of System Informatics, Kobe University, Kobe, Hyogo,657-8501, Japan
[2] Hyogo Institute of Assistive Technology, Kobe, Hyogo,651-2134, Japan
关键词
Assistive technology - Audio visual speech recognition - Conventional methods - Feature extraction methods - Lip reading - Multi-modal - Robust feature extractions - Speaker independent model;
D O I
10.2197/ipsjtcva.7.64
中图分类号
学科分类号
摘要
In this paper, we propose an audio-visual speech recognition system for a person with an articulation disorder resulting from severe hearing loss. In the case of a person with this type of articulation disorder, the speech style is quite different from with the result that of people without hearing loss that a speaker-independent model for unimpaired persons is hardly useful for recognizing it. We investigate in this paper an audio-visual speech recognition system for a person with severe hearing loss in noisy environments, where a robust feature extraction method using a convolutive bottleneck network (CBN) is applied to audio-visual data. We confirmed the effectiveness of this approach through word-recognition experiments in noisy environments, where the CBN-based feature extraction method outperformed the conventional methods. © 2015 Information Processing Society of Japan.
引用
收藏
页码:64 / 68
相关论文
共 50 条
  • [31] An asynchronous DBN for audio-visual speech recognition
    Saenko, Kate
    Livescu, Karen
    2006 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, 2006, : 154 - +
  • [32] Audio-visual modeling for bimodal speech recognition
    Kaynak, MN
    Zhi, Q
    Cheok, AD
    Sengupta, K
    Chung, KC
    2001 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: E-SYSTEMS AND E-MAN FOR CYBERNETICS IN CYBERSPACE, 2002, : 181 - 186
  • [33] Bimodal fusion in audio-visual speech recognition
    Zhang, XZ
    Mersereau, RM
    Clements, M
    2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL I, PROCEEDINGS, 2002, : 964 - 967
  • [34] Guide to Audio-Visual Materials on Speech and Hearing Disorders
    不详
    VOLTA REVIEW, 1953, 55 (02) : 102 - 102
  • [35] Audio-Visual Speech Enhancement using Deep Neural Networks
    Hou, Jen-Cheng
    Wang, Syu-Siang
    Lai, Ying-Hui
    Lin, Jen-Chun
    Tsao, Yu
    Chang, Hsiu-Wen
    Wang, Hsin-Min
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [36] Recognition of Isolated Digit Using Random Forest for Audio-Visual Speech Recognition
    Prashant Borde
    Sadanand Kulkarni
    Bharti Gawali
    Pravin Yannawar
    Proceedings of the National Academy of Sciences, India Section A: Physical Sciences, 2022, 92 : 103 - 110
  • [37] Recognition of Isolated Digit Using Random Forest for Audio-Visual Speech Recognition
    Borde, Prashant
    Kulkarni, Sadanand
    Gawali, Bharti
    Yannawar, Pravin
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES INDIA SECTION A-PHYSICAL SCIENCES, 2022, 92 (01) : 103 - 110
  • [38] Using Twin-HMM-Based Audio-Visual Speech Enhancement as a Front-End for Robust Audio-Visual Speech Recognition
    Abdelaziz, Ahmed Hussen
    Zeiler, Steffen
    Kolossa, Dorothea
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 867 - 871
  • [39] Intermodal timing relations and audio-visual speech recognition by normal-hearing adults
    McGrath, M.
    Summerfield, Q.
    Journal of the Acoustical Society of America, 1985, 77 (02): : 678 - 685
  • [40] Effects of hearing loss and audio-visual cues on children?s speech processing speed
    Holt, Rebecca
    Bruggeman, Laurence
    Demuth, Katherine
    SPEECH COMMUNICATION, 2023, 146 : 11 - 21