Audio-visual speech recognition using convolutive bottleneck networks for a person with severe hearing loss

被引:0
|
作者
Takashima, Yuki [1 ]
Kakihara, Yasuhiro [1 ]
Aihara, Ryo [1 ]
Takiguchi, Tetsuya [1 ]
Ariki, Yasuo [1 ]
Mitani, Nobuyuki [2 ]
Omori, Kiyohiro [2 ]
Nakazono, Kaoru [2 ]
机构
[1] Graduate School of System Informatics, Kobe University, Kobe, Hyogo,657-8501, Japan
[2] Hyogo Institute of Assistive Technology, Kobe, Hyogo,651-2134, Japan
关键词
Assistive technology - Audio visual speech recognition - Conventional methods - Feature extraction methods - Lip reading - Multi-modal - Robust feature extractions - Speaker independent model;
D O I
10.2197/ipsjtcva.7.64
中图分类号
学科分类号
摘要
In this paper, we propose an audio-visual speech recognition system for a person with an articulation disorder resulting from severe hearing loss. In the case of a person with this type of articulation disorder, the speech style is quite different from with the result that of people without hearing loss that a speaker-independent model for unimpaired persons is hardly useful for recognizing it. We investigate in this paper an audio-visual speech recognition system for a person with severe hearing loss in noisy environments, where a robust feature extraction method using a convolutive bottleneck network (CBN) is applied to audio-visual data. We confirmed the effectiveness of this approach through word-recognition experiments in noisy environments, where the CBN-based feature extraction method outperformed the conventional methods. © 2015 Information Processing Society of Japan.
引用
收藏
页码:64 / 68
相关论文
共 50 条
  • [21] Audio-Visual Speech Recognition in Noisy Audio Environments
    Palecek, Karel
    Chaloupka, Josef
    2013 36TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2013, : 484 - 487
  • [22] Auxiliary Loss Multimodal GRU Model in Audio-Visual Speech Recognition
    Yuan, Yuan
    Tian, Chunlin
    Lu, Xiaoqiang
    IEEE ACCESS, 2018, 6 : 5573 - 5583
  • [23] USING MULTIPLE VISUAL TANDEM STREAMS IN AUDIO-VISUAL SPEECH RECOGNITION
    Topkaya, Ibrahim Saygin
    Erdogan, Hakan
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4988 - 4991
  • [24] Audio-visual speech recognition using MPEGA compliant visual features
    Aleksic, PS
    Williams, JJ
    Wu, ZL
    Katsaggelos, AK
    EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2002, 2002 (11) : 1213 - 1227
  • [25] Multimodal Learning Using 3D Audio-Visual Data or Audio-Visual Speech Recognition
    Su, Rongfeng
    Wang, Lan
    Liu, Xunying
    2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 40 - 43
  • [26] Dysarthric Speech Recognition Using a Convolutive Bottleneck Network
    Nakashika, Toru
    Yoshioka, Toshiya
    Takiguchi, Tetsuya
    Ariki, Yasuo
    Duffner, Stefan
    Garcia, Christophe
    2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 505 - 509
  • [27] Audio-Visual Speech Modeling for Continuous Speech Recognition
    Dupont, Stephane
    Luettin, Juergen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (03) : 141 - 151
  • [28] Large vocabulary audio-visual speech recognition using the Janus speech recognition toolkit
    Kratt, J
    Metze, F
    Stiefelhagen, R
    Waibel, A
    PATTERN RECOGNITION, 2004, 3175 : 488 - 495
  • [29] A coupled HMM for audio-visual speech recognition
    Nefian, AV
    Liang, LH
    Pi, XB
    Xiaoxiang, L
    Mao, C
    Murphy, K
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 2013 - 2016
  • [30] Speaker independent audio-visual speech recognition
    Zhang, Y
    Levinson, S
    Huang, T
    2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 1073 - 1076