Audio-visual speech recognition using convolutive bottleneck networks for a person with severe hearing loss

被引:0
|
作者
Takashima, Yuki [1 ]
Kakihara, Yasuhiro [1 ]
Aihara, Ryo [1 ]
Takiguchi, Tetsuya [1 ]
Ariki, Yasuo [1 ]
Mitani, Nobuyuki [2 ]
Omori, Kiyohiro [2 ]
Nakazono, Kaoru [2 ]
机构
[1] Graduate School of System Informatics, Kobe University, Kobe, Hyogo,657-8501, Japan
[2] Hyogo Institute of Assistive Technology, Kobe, Hyogo,651-2134, Japan
关键词
Assistive technology - Audio visual speech recognition - Conventional methods - Feature extraction methods - Lip reading - Multi-modal - Robust feature extractions - Speaker independent model;
D O I
10.2197/ipsjtcva.7.64
中图分类号
学科分类号
摘要
In this paper, we propose an audio-visual speech recognition system for a person with an articulation disorder resulting from severe hearing loss. In the case of a person with this type of articulation disorder, the speech style is quite different from with the result that of people without hearing loss that a speaker-independent model for unimpaired persons is hardly useful for recognizing it. We investigate in this paper an audio-visual speech recognition system for a person with severe hearing loss in noisy environments, where a robust feature extraction method using a convolutive bottleneck network (CBN) is applied to audio-visual data. We confirmed the effectiveness of this approach through word-recognition experiments in noisy environments, where the CBN-based feature extraction method outperformed the conventional methods. © 2015 Information Processing Society of Japan.
引用
收藏
页码:64 / 68
相关论文
共 50 条
  • [1] Audio-Visual Speech Recognition Using Bimodal-Trained Bottleneck Features for a Person with Severe Hearing Loss
    Takashima, Yuki
    Aihara, Ryo
    Takiguchi, Tetsuya
    Ariki, Yasuo
    Mitani, Nobuyuki
    Omori, Kiyohiro
    Nakazono, Kaoru
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 277 - 281
  • [2] Integration of Deep Bottleneck Features for Audio-Visual Speech Recognition
    Ninomiya, Hiroshi
    Kitaoka, Norihide
    Tamura, Satoshi
    Iribe, Yurie
    Takeda, Kazuya
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 563 - 567
  • [3] Audio-visual speech recognition using red exclusion and neural networks
    Lewis, TW
    Powers, DMW
    JOURNAL OF RESEARCH AND PRACTICE IN INFORMATION TECHNOLOGY, 2003, 35 (01): : 41 - 64
  • [4] Dynamic Bayesian Networks for Audio-Visual Speech Recognition
    Ara V. Nefian
    Luhong Liang
    Xiaobo Pi
    Xiaoxing Liu
    Kevin Murphy
    EURASIP Journal on Advances in Signal Processing, 2002
  • [5] Dynamic Bayesian networks for audio-visual speech recognition
    Nefian, AV
    Liang, LH
    Pi, XB
    Liu, XX
    Murphy, K
    EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2002, 2002 (11) : 1274 - 1288
  • [6] A Robust Audio-visual Speech Recognition Using Audio-visual Voice Activity Detection
    Tamura, Satoshi
    Ishikawa, Masato
    Hashiba, Takashi
    Takeuchi, Shin'ichi
    Hayamizu, Satoru
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2702 - +
  • [7] Audio-visual speech recognition using deep bottleneck features and high-performance lipreading
    Tamura, Satoshi
    Ninomiya, Hiroshi
    Kitaoka, Norihide
    Osuga, Shin
    Iribe, Yurie
    Takeda, Kazuya
    Hayamizu, Satoru
    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 575 - 582
  • [8] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
    Hwang, Jung-Wook
    Park, Jeongkyun
    Park, Rae-Hong
    Park, Hyung-Min
    APPLIED ACOUSTICS, 2023, 211
  • [9] An audio-visual speech recognition with a new mandarin audio-visual database
    Liao, Wen-Yuan
    Pao, Tsang-Long
    Chen, Yu-Te
    Chang, Tsun-Wei
    INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 19 - +
  • [10] Improving Audio-Visual Speech Recognition Using Gabor Recurrent Neural Networks
    Saudi, Ali S.
    Khalil, Mahmoud I.
    Abbas, Hazem M.
    MULTIMODAL PATTERN RECOGNITION OF SOCIAL SIGNALS IN HUMAN-COMPUTER-INTERACTION, MPRSS 2018, 2019, 11377 : 71 - 83