Robustness of a chaotic modal neural network applied to audio-visual speech recognition

被引:1
|
作者
Kabre, H
机构
关键词
robustness; chaos; audio-visual speech recognition; adaptation;
D O I
10.1109/NNSP.1997.622443
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We stabilized a chaotic Modal Neural Network (MNN) for the purpose of robust speech recognition. A Modal Neural Network is an Artificial Neural Network system which includes two levels of information processing. The first level is trained to store and retrieve some acoustic and visual patterns. The different states of this network, which represent the sound classes in a task of speech recognition, are called modes and are supposed to chaotically evolve when speech recognition is performed in adverse environments. The control of the chaotic behavior of the different modes constitutes the second level. An external signal, taken from a visual input such as the lip-opening parameters of the speaker is applied to stabilize an acoustic modal network of which the modes are moved from an initial position to a target position. The addressed task is the audio-visual recognition of the 10 French vowels, perturbed by some noises. The Perceptual Linear Predictive analysis applied to the speech signal of the 10 vowels outputs some vectors formed by 5 spectral parameters. They are in turn fed into a Modal Neural Network implemented as a feedforward network. When the noise level increases, the classes stored by the acoustic MNN exhibit a chaotic behavior which is stabilized by the signal given by the visual path. We show that in an uncooperative environment, a chaotic modal neural network stabilizes well.
引用
收藏
页码:607 / 616
页数:10
相关论文
共 50 条
  • [1] RECURRENT NEURAL NETWORK TRANSDUCER FOR AUDIO-VISUAL SPEECH RECOGNITION
    Makino, Takaki
    Liao, Hank
    Assael, Yannis
    Shillingford, Brendan
    Garcia, Basilio
    Braga, Otavio
    Siohan, Olivier
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 905 - 912
  • [2] Audio-Visual Speech Recognition System Using Recurrent Neural Network
    Goh, Yeh-Huann
    Lau, Kai-Xian
    Lee, Yoon-Ket
    [J]. PROCEEDINGS OF THE 2019 4TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY (INCIT): ENCOMPASSING INTELLIGENT TECHNOLOGY AND INNOVATION TOWARDS THE NEW ERA OF HUMAN LIFE, 2019, : 38 - 43
  • [3] Fuzzy-Neural-Network Based Audio-Visual Fusion for Speech Recognition
    Wu, Gin-Der
    Tsai, Hao-Shu
    [J]. 2019 1ST INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (ICAIIC 2019), 2019, : 210 - 214
  • [4] Audio-Visual (Multimodal) Speech Recognition System Using Deep Neural Network
    Paulin, Hebsibah
    Milton, R. S.
    JanakiRaman, S.
    Chandraprabha, K.
    [J]. JOURNAL OF TESTING AND EVALUATION, 2019, 47 (06) : 3963 - 3974
  • [5] RBF neural network mouth tracking for audio-visual speech recognition system
    Hui, LE
    Seng, KP
    Tse, KM
    [J]. TENCON 2004 - 2004 IEEE REGION 10 CONFERENCE, VOLS A-D, PROCEEDINGS: ANALOG AND DIGITAL TECHNIQUES IN ELECTRICAL ENGINEERING, 2004, : A84 - A87
  • [6] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
    Hwang, Jung-Wook
    Park, Jeongkyun
    Park, Rae-Hong
    Park, Hyung-Min
    [J]. APPLIED ACOUSTICS, 2023, 211
  • [7] CATNet: Cross-modal fusion for audio-visual speech recognition
    Wang, Xingmei
    Mi, Jiachen
    Li, Boquan
    Zhao, Yixu
    Meng, Jiaxiang
    [J]. PATTERN RECOGNITION LETTERS, 2024, 178 : 216 - 222
  • [8] An audio-visual speech recognition with a new mandarin audio-visual database
    Liao, Wen-Yuan
    Pao, Tsang-Long
    Chen, Yu-Te
    Chang, Tsun-Wei
    [J]. INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 19 - +
  • [9] A Deep Neural Network for Audio-Visual Person Recognition
    Alam, Mohammad Rafiqul
    Bennamoun, Mohammed
    Togneri, Roberto
    Sohel, Ferdous
    [J]. 2015 IEEE 7TH INTERNATIONAL CONFERENCE ON BIOMETRICS THEORY, APPLICATIONS AND SYSTEMS (BTAS 2015), 2015,
  • [10] An investigation of audio-visual speech recognition as applied to multimedia speech therapy applications
    Georgopoulos, VC
    [J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, PROCEEDINGS VOL 1, 1999, : 481 - 486