Bayesian separation of audio-visual speech sources

被引:0
|
作者
Rajaram, S [1 ]
Nefian, AV [1 ]
Huang, TS [1 ]
机构
[1] Univ Illinois, Urbana, IL 61801 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we investigate the use of audio and visual rather than only audio features for the task of speech separation in acoustically noisy environments. The success of existing independent component analysis (ICA) systems for the separation of a large variety of signals, including speech, is often limited by the ability of this technique to handle noise. In this paper, we introduce a Bayesian model for the mixing process that describes both the bimodality and the time dependency of speech sources. Our experimental results show that the online demixing process presented here outperforms both the ICA and the audio-only Bayesian model at all levels of noise.
引用
收藏
页码:657 / 660
页数:4
相关论文
共 50 条
  • [41] Audio-Visual Speech Modeling for Continuous Speech Recognition
    Dupont, Stephane
    Luettin, Juergen
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (03) : 141 - 151
  • [42] Audio-visual speech perception without speech cues
    Saldana, HM
    Pisoni, DB
    Fellowes, JM
    Remez, RE
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2187 - 2190
  • [43] DNN driven Speaker Independent Audio-Visual Mask Estimation for Speech Separation
    Gogate, Mandar
    Adeel, Ahsan
    Marxer, Ricard
    Barker, Jon
    Hussain, Amir
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2723 - 2727
  • [44] REVERBERANT SPEECH SEPARATION BASED ON AUDIO-VISUAL DICTIONARY LEARNING AND BINAURAL CUES
    Liu, Qingju
    Wang, Wenwu
    Jackson, Philip
    Barnard, Mark
    [J]. 2012 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2012, : 664 - 667
  • [45] TIME-DOMAIN AUDIO-VISUAL SPEECH SEPARATION ON LOW QUALITY VIDEOS
    Wu, Yifei
    Li, Chenda
    Bai, Jinfeng
    Wu, Zhongqin
    Qian, Yanmin
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 256 - 260
  • [46] Audio-Visual Speech Separation and Dereverberation With a Two-Stage Multimodal Network
    Tan, Ke
    Xu, Yong
    Zhang, Shi-Xiong
    Yu, Meng
    Yu, Dong
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (03) : 542 - 553
  • [47] Real-time speaker localization and speech separation by audio-visual integration
    Nakadai, K
    Hidai, K
    Okuno, HG
    Kitano, H
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS I-IV, PROCEEDINGS, 2002, : 1043 - 1049
  • [48] AUDIO-VISUAL SPEECH SEPARATION USING CROSS-MODAL CORRESPONDENCE LOSS
    Makishima, Naoki
    Ihori, Mana
    Takashima, Akihiko
    Tanaka, Tomohiro
    Orihashi, Shota
    Masumura, Ryo
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6673 - 6677
  • [49] An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation
    Michelsanti, Daniel
    Tan, Zheng-Hua
    Zhang, Shi-Xiong
    Xu, Yong
    Yu, Meng
    Yu, Dong
    Jensen, Jesper
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1368 - 1396
  • [50] A Facial Feature and Lip Movement Enhanced Audio-Visual Speech Separation Model
    Li, Guizhu
    Fu, Min
    Sun, Mengnan
    Liu, Xuefeng
    Zheng, Bing
    [J]. SENSORS, 2023, 23 (21)