MANDARIN AUDIO-VISUAL SPEECH RECOGNITION WITH EFFECTS TO THE NOISE AND EMOTION

被引:0
|
作者
Pao, Tsang-Long [1 ]
Liao, Wen-Yuan [2 ]
Chen, Yu-Te [1 ]
Wu, Tsan-Nung [1 ]
机构
[1] Tatung Univ, Dept Comp Sci & Engn, Taipei 104, Taiwan
[2] DeLin Inst Technol, Dept Comp Sci & Informat Engn, Tucheng City 236, Taipei County, Taiwan
关键词
Audio-visual recognition; Feature extraction; Gaussian mixture model; K-nearest neighbour; Hidden Markov model; Weighted-discrete KNN; HIDDEN MARKOV-MODELS; SPEAKER RECOGNITION; FEATURES; EXTRACTION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents; a Mandarin audio-visual recognition system dealing with noisy and emotional speech signal. In the proposed approach, we extract the visual features of the lips. These features are very important to the recognition system. especially in noisy condition or with emotional effects. In this recognition system., we propose to use the weighted-discrete KNN as the classifier and compare the results with two popular classifiers, the GAM and HMM, and evaluate their performance by applying to a Mandarin audio-visual speech corpus. The experimental results of different classifiers at various SNR. levels are presented The results show that using the WD-KNN classifier yields better recognition accuracy than. other classifiers for the used Mandarin speech corpus.
引用
收藏
页码:711 / 723
页数:13
相关论文
共 50 条
  • [1] An audio-visual speech recognition with a new mandarin audio-visual database
    Liao, Wen-Yuan
    Pao, Tsang-Long
    Chen, Yu-Te
    Chang, Tsun-Wei
    INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 19 - +
  • [2] Automatic Visual Feature Extraction for Mandarin Audio-Visual Speech Recognition
    Pao, Tsang-Long
    Liao, Wen-Yuan
    Wu, Tsan-Nung
    Lin, Ching-Yi
    2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 2936 - 2940
  • [3] Audio-Visual Speech Emotion Recognition by Disentangling Emotion and Identity Attributes
    Ito, Koichiro
    Fujioka, Takuya
    Sun, Qinghua
    Nagamatsu, Kenji
    INTERSPEECH 2021, 2021, : 4493 - 4497
  • [4] Noise adaptive stream weighting in audio-visual speech recognition
    Heckmann, M
    Berthommier, F
    Kroschel, K
    EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2002, 2002 (11) : 1260 - 1273
  • [5] Noise Adaptive Stream Weighting in Audio-Visual Speech Recognition
    Martin Heckmann
    Frédéric Berthommier
    Kristian Kroschel
    EURASIP Journal on Advances in Signal Processing, 2002
  • [6] AUDIO-VISUAL DEEP LEARNING FOR NOISE ROBUST SPEECH RECOGNITION
    Huang, Jing
    Kingsbury, Brian
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7596 - 7599
  • [7] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
    Hwang, Jung-Wook
    Park, Jeongkyun
    Park, Rae-Hong
    Park, Hyung-Min
    APPLIED ACOUSTICS, 2023, 211
  • [8] Audio-visual spontaneous emotion recognition
    Zeng, Zhihong
    Hu, Yuxiao
    Roisman, Glenn I.
    Wen, Zhen
    Fu, Yun
    Huang, Thomas S.
    ARTIFICIAL INTELLIGENCE FOR HUMAN COMPUTING, 2007, 4451 : 72 - +
  • [9] DISCRIMINATIVE STREAM-WEIGHT TRAINING FOR MANDARIN AUDIO-VISUAL SPEECH RECOGNITION
    Wu, Guanyong
    Zhu, Jie
    JOURNAL OF THE CHINESE INSTITUTE OF ENGINEERS, 2010, 33 (05) : 775 - 780
  • [10] Audio-Visual Domain Adaptation Feature Fusion for Speech Emotion Recognition
    Wei, Jie
    Hu, Guanyu
    Yang, Xinyu
    Luu, Anh Tuan
    Dong, Yizhuo
    INTERSPEECH 2022, 2022, : 1988 - 1992