RBF neural network mouth tracking for audio-visual speech recognition system

被引:0
|
作者
Hui, LE [1 ]
Seng, KP [1 ]
Tse, KM [1 ]
机构
[1] Monash Univ, Sch Engn, Selangor 46150, Malaysia
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A great interest in the research of audio-visual speech recognition (AVSR) systems is driven by the increase in the number of multimedia applications that require robust speech recognition systems. The use of visual features in AVSR is justified by both the audio and visual modality of the speech generation and the need for features that are invariant to acoustic noise perturbation. The performance of the AVSR system relies on a robust set of visual features obtained from the accurate detection and tracking of the mouth region. Therefore the mouth tracking plays a major role in A VSR systems. This paper presents an improvement version of mouth tracking technique using radial basis function neural network (RBF NN) with its applications to A VSR systems. A modified extended Kalman filter (EKF) is used to adjust the parameters of the RBF NN. Simulation results have revealed good performance of the proposed method.
引用
收藏
页码:A84 / A87
页数:4
相关论文
共 50 条
  • [1] Audio-Visual Speech Recognition System Using Recurrent Neural Network
    Goh, Yeh-Huann
    Lau, Kai-Xian
    Lee, Yoon-Ket
    [J]. PROCEEDINGS OF THE 2019 4TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY (INCIT): ENCOMPASSING INTELLIGENT TECHNOLOGY AND INNOVATION TOWARDS THE NEW ERA OF HUMAN LIFE, 2019, : 38 - 43
  • [2] Audio-Visual (Multimodal) Speech Recognition System Using Deep Neural Network
    Paulin, Hebsibah
    Milton, R. S.
    JanakiRaman, S.
    Chandraprabha, K.
    [J]. JOURNAL OF TESTING AND EVALUATION, 2019, 47 (06) : 3963 - 3974
  • [3] RECURRENT NEURAL NETWORK TRANSDUCER FOR AUDIO-VISUAL SPEECH RECOGNITION
    Makino, Takaki
    Liao, Hank
    Assael, Yannis
    Shillingford, Brendan
    Garcia, Basilio
    Braga, Otavio
    Siohan, Olivier
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 905 - 912
  • [4] Lip Tracking Method for the System of Audio-Visual Polish Speech Recognition
    Kubanek, Mariusz
    Bobulski, Janusz
    Adrjanowicz, Lukasz
    [J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT I, 2012, 7267 : 535 - 542
  • [5] An audio-visual speech recognition system for testing new audio-visual databases
    Pao, Tsang-Long
    Liao, Wen-Yuan
    [J]. VISAPP 2006: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2, 2006, : 192 - +
  • [6] Fuzzy-Neural-Network Based Audio-Visual Fusion for Speech Recognition
    Wu, Gin-Der
    Tsai, Hao-Shu
    [J]. 2019 1ST INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (ICAIIC 2019), 2019, : 210 - 214
  • [7] Robustness of a chaotic modal neural network applied to audio-visual speech recognition
    Kabre, H
    [J]. NEURAL NETWORKS FOR SIGNAL PROCESSING VII, 1997, : 607 - 616
  • [8] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
    Hwang, Jung-Wook
    Park, Jeongkyun
    Park, Rae-Hong
    Park, Hyung-Min
    [J]. APPLIED ACOUSTICS, 2023, 211
  • [9] Lips Detection for Audio-Visual Speech Recognition System
    Chin, Siew Wen
    Ang, Li-Minn
    Seng, Kah Phooi
    [J]. 2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATIONS SYSTEMS (ISPACS 2008), 2008, : 311 - 314
  • [10] An audio-visual speech recognition with a new mandarin audio-visual database
    Liao, Wen-Yuan
    Pao, Tsang-Long
    Chen, Yu-Te
    Chang, Tsun-Wei
    [J]. INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 19 - +