Speaker independent audio-visual speech recognition

被引:0
|
作者
Zhang, Y [1 ]
Levinson, S [1 ]
Huang, T [1 ]
机构
[1] Univ Illinois, Dept Elect & Comp Engn, Coordinated Sci Lab, Urbana, IL 61801 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a general framework of integrating multimodal sensory signals for spatial temporal pattern recognition. Statistical methods are used to model time varying events in a collaborative manner such that the inter-modal GO-occurrence are taken into account. We discuss various data fusion strategies, modeling of the inter-modal correlations and extracting statistical parameters for multi-modal models. A bimodal speech recognition system is implemented. A speaker-independent experiment is carried out to test the audio-visual speech recognizer under different kinds of noises from a noise database. Consistent improvements of word recognition accuracy (WRA) are achieved using a cross-validation scheme over different signal-to-noise ratios.
引用
收藏
页码:1073 / 1076
页数:4
相关论文
共 50 条
  • [31] An Attention Based Speaker-Independent Audio-Visual Deep Learning Model for Speech Enhancement
    Sun, Zhongbo
    Wang, Yannan
    Cao, Li
    [J]. MULTIMEDIA MODELING (MMM 2020), PT II, 2020, 11962 : 722 - 728
  • [32] Speaker-Targeted Audio-Visual Models for Speech Recognition in Cocktail-Party Environments
    Chao, Guan-Lin
    Chan, William
    Lane, Ian
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2120 - 2124
  • [33] Integration strategies for audio-visual speech processing: Applied to text-dependent speaker recognition
    Lucey, S
    Chen, TH
    Sridharan, S
    Chandran, V
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2005, 7 (03) : 495 - 506
  • [34] A CLOSER LOOK AT AUDIO-VISUAL MULTI-PERSON SPEECH RECOGNITION AND ACTIVE SPEAKER SELECTION
    Braga, Otavio
    Siohan, Olivier
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6863 - 6867
  • [35] Integration of audio-visual information for multi-speaker multimedia speaker recognition
    Yang, Jichen
    Chen, Fangfan
    Cheng, Yu
    Lin, Pei
    [J]. DIGITAL SIGNAL PROCESSING, 2024, 145
  • [36] Audio-visual speech recognition using lstm and cnn
    El Maghraby, Eslam E.
    Gody, Amr M.
    Farouk, M. Hesham
    [J]. Recent Advances in Computer Science and Communications, 2021, 14 (06) : 2023 - 2039
  • [37] Building a data corpus for audio-visual speech recognition
    Chitu, Alin G.
    Rothkrantz, Leon J. M.
    [J]. EUROMEDIA '2007, 2007, : 88 - 92
  • [38] Audio-visual fuzzy fusion for robust speech recognition
    Malcangi, M.
    Ouazzane, K.
    Patel, P.
    [J]. 2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
  • [39] DARE: Deceiving Audio-Visual speech Recognition model
    Mishra, Saumya
    Gupta, Anup Kumar
    Gupta, Puneet
    [J]. KNOWLEDGE-BASED SYSTEMS, 2021, 232
  • [40] Audio-Visual Automatic Speech Recognition for Connected Digits
    Wang, Xiaoping
    Hao, Yufeng
    Fu, Degang
    Yuan, Chunwei
    [J]. 2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL III, PROCEEDINGS, 2008, : 328 - +