MANDARIN AUDIO-VISUAL SPEECH RECOGNITION WITH EFFECTS TO THE NOISE AND EMOTION

被引:0
|
作者
Pao, Tsang-Long [1 ]
Liao, Wen-Yuan [2 ]
Chen, Yu-Te [1 ]
Wu, Tsan-Nung [1 ]
机构
[1] Tatung Univ, Dept Comp Sci & Engn, Taipei 104, Taiwan
[2] DeLin Inst Technol, Dept Comp Sci & Informat Engn, Tucheng City 236, Taipei County, Taiwan
关键词
Audio-visual recognition; Feature extraction; Gaussian mixture model; K-nearest neighbour; Hidden Markov model; Weighted-discrete KNN; HIDDEN MARKOV-MODELS; SPEAKER RECOGNITION; FEATURES; EXTRACTION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents; a Mandarin audio-visual recognition system dealing with noisy and emotional speech signal. In the proposed approach, we extract the visual features of the lips. These features are very important to the recognition system. especially in noisy condition or with emotional effects. In this recognition system., we propose to use the weighted-discrete KNN as the classifier and compare the results with two popular classifiers, the GAM and HMM, and evaluate their performance by applying to a Mandarin audio-visual speech corpus. The experimental results of different classifiers at various SNR. levels are presented The results show that using the WD-KNN classifier yields better recognition accuracy than. other classifiers for the used Mandarin speech corpus.
引用
收藏
页码:711 / 723
页数:13
相关论文
共 50 条
  • [21] Effects of aging on audio-visual speech integration Effects of aging on audio-visual speech integration
    Huyse, Aurelie
    Leybaert, Jacqueline
    Berthommier, Frederic
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2014, 136 (04): : 1918 - 1931
  • [22] MULTI-SCALE HYBRID FUSION NETWORK FOR MANDARIN AUDIO-VISUAL SPEECH RECOGNITION
    Wang, Jinxin
    Guo, Zhongwen
    Yang, Chao
    Li, Xiaomei
    Cui, Ziyuan
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 642 - 647
  • [23] Audio-Visual Speech Recognition in Noisy Audio Environments
    Palecek, Karel
    Chaloupka, Josef
    2013 36TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2013, : 484 - 487
  • [24] \ Audio-visual speech recognition with weighted KNN-based classification in mandarin database
    Pao, Tsang-Long
    Liao, Wen-Yuan
    Chen, Yu-Te
    2007 THIRD INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, VOL 1, PROCEEDINGS, 2007, : 39 - +
  • [25] Audio-Visual Learning for Multimodal Emotion Recognition
    Fan, Siyu
    Jing, Jianan
    Wang, Chongwen
    SYMMETRY-BASEL, 2025, 17 (03):
  • [26] Audio-Visual Attention Networks for Emotion Recognition
    Lee, Jiyoung
    Kim, Sunok
    Kim, Seungryong
    Sohn, Kwanghoon
    AVSU'18: PROCEEDINGS OF THE 2018 WORKSHOP ON AUDIO-VISUAL SCENE UNDERSTANDING FOR IMMERSIVE MULTIMEDIA, 2018, : 27 - 32
  • [27] Deep operational audio-visual emotion recognition
    Akturk, Kaan
    Keceli, Ali Seydi
    NEUROCOMPUTING, 2024, 588
  • [28] Audio-Visual Emotion Recognition in Video Clips
    Noroozi, Fatemeh
    Marjanovic, Marina
    Njegus, Angelina
    Escalera, Sergio
    Anbarjafari, Gholamreza
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2019, 10 (01) : 60 - 75
  • [29] Audio-Visual Speech Modeling for Continuous Speech Recognition
    Dupont, Stephane
    Luettin, Juergen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (03) : 141 - 151
  • [30] Audio-visual speech in noise perception in dyslexia
    van Laarhoven, Thijs
    Keetels, Mirjam
    Schakel, Lemmy
    Vroomen, Jean
    DEVELOPMENTAL SCIENCE, 2018, 21 (01)