RBF neural network mouth tracking for audio-visual speech recognition system

被引:0
|
作者
Hui, LE [1 ]
Seng, KP [1 ]
Tse, KM [1 ]
机构
[1] Monash Univ, Sch Engn, Selangor 46150, Malaysia
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A great interest in the research of audio-visual speech recognition (AVSR) systems is driven by the increase in the number of multimedia applications that require robust speech recognition systems. The use of visual features in AVSR is justified by both the audio and visual modality of the speech generation and the need for features that are invariant to acoustic noise perturbation. The performance of the AVSR system relies on a robust set of visual features obtained from the accurate detection and tracking of the mouth region. Therefore the mouth tracking plays a major role in A VSR systems. This paper presents an improvement version of mouth tracking technique using radial basis function neural network (RBF NN) with its applications to A VSR systems. A modified extended Kalman filter (EKF) is used to adjust the parameters of the RBF NN. Simulation results have revealed good performance of the proposed method.
引用
收藏
页码:A84 / A87
页数:4
相关论文
共 50 条
  • [41] Audio-Visual Automatic Speech Recognition for Connected Digits
    Wang, Xiaoping
    Hao, Yufeng
    Fu, Degang
    Yuan, Chunwei
    [J]. 2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL III, PROCEEDINGS, 2008, : 328 - +
  • [42] DARE: Deceiving Audio-Visual speech Recognition model
    Mishra, Saumya
    Gupta, Anup Kumar
    Gupta, Puneet
    [J]. KNOWLEDGE-BASED SYSTEMS, 2021, 232
  • [43] Multistage information fusion for audio-visual speech recognition
    Chu, SM
    Libal, V
    Marcheret, E
    Neti, C
    Potamianos, G
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 1651 - 1654
  • [44] DEEP MULTIMODAL LEARNING FOR AUDIO-VISUAL SPEECH RECOGNITION
    Mroueh, Youssef
    Marcheret, Etienne
    Goel, Vaibhava
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 2130 - 2134
  • [45] Relevant feature selection for audio-visual speech recognition
    Drugman, Thomas
    Gurban, Mihai
    Thiran, Jean-Philippe
    [J]. 2007 IEEE NINTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2007, : 179 - +
  • [46] Weighting schemes for audio-visual fusion in speech recognition
    Glotin, H
    Vergyri, D
    Neti, C
    Potamianos, G
    Luettin, J
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 173 - 176
  • [47] Dynamic Bayesian Networks for Audio-Visual Speech Recognition
    Ara V. Nefian
    Luhong Liang
    Xiaobo Pi
    Xiaoxing Liu
    Kevin Murphy
    [J]. EURASIP Journal on Advances in Signal Processing, 2002
  • [48] Connectionism based audio-visual speech recognition method
    Che, Na
    Zhu, Yi-Ming
    Zhao, Jian
    Sun, Lei
    Shi, Li-Juan
    Zeng, Xian-Wei
    [J]. Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2024, 54 (10): : 2984 - 2993
  • [49] Research on Robust Audio-Visual Speech Recognition Algorithms
    Yang, Wenfeng
    Li, Pengyi
    Yang, Wei
    Liu, Yuxing
    He, Yulong
    Petrosian, Ovanes
    Davydenko, Aleksandr
    [J]. MATHEMATICS, 2023, 11 (07)
  • [50] On Dynamic Stream Weighting for Audio-Visual Speech Recognition
    Estellers, Virginia
    Gurban, Mihai
    Thiran, Jean-Philippe
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (04): : 1145 - 1157