Speech Estimation in Non-Stationary Noise Environments Using Timing Structures between Mouth Movements and Sound Signals

被引:0
|
作者
Kawashima, Hiroaki [1 ]
Horii, Yu [1 ]
Matsuyama, Takashi [1 ]
机构
[1] Kyoto Univ, Grad Sch Informat, Kyoto 6068501, Japan
关键词
multimodal; non-stationary noise; timing; linear dynamical system; particle filtering;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A variety of methods for audio-visual integration, which integrate audio and visual information at the level of either features, states, or classifier outputs, have been proposed for the purpose of robust speech recognition. However, these methods do not always fully utilize auditory information when the signal-to-noise ratio becomes low. In this paper, we propose a novel approach to estimate speech signal in noise environments. The key idea behind this approach is to exploit clean speech candidates generated by using timing structures between mouth movements and sound signals. We first extract a pair of feature sequences of media signals and segment each sequence into temporal intervals. Then, we construct a cross-media timing-structure model of human speech by learning the temporal relations of overlapping intervals. Based on the learned model, we generate clean speech candidates from the observed mouth movements.
引用
收藏
页码:442 / 445
页数:4
相关论文
共 45 条
  • [1] DOA estimation for uncorrelated and coherent signals in non-stationary noise environments
    Gholipour, Atefeh
    Zakeri, Bijan
    Mafinezhad, Khalil
    [J]. INTERNATIONAL JOURNAL OF ELECTRONICS, 2020, 107 (01) : 141 - 156
  • [2] Speech enhancement for non-stationary noise environments
    Cohen, I
    Berdugo, B
    [J]. SIGNAL PROCESSING, 2001, 81 (11) : 2403 - 2418
  • [3] Noise Estimation with an Inverse Comb Filter in Non-Stationary Noise Environments
    Shimamura, Tetsuya
    Kato, Fumiya
    [J]. 2017 IEEE 60TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2017, : 156 - 159
  • [4] Mask Estimation in Non-stationary Noise Environments for Missing Feature Based Robust Speech Recognition
    Badiezadegan, Shirin
    Rose, Richard C.
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2062 - 2065
  • [5] Enhancement and Noise Statistics Estimation for Non-Stationary Voiced Speech
    Norholm, Sidsel Marie
    Jensen, Jesper Rindom
    Christensen, Mads Grsboll
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (04) : 645 - 658
  • [6] A noise-estimation algorithm for highly non-stationary environments
    Rangachari, S
    Loizou, PC
    [J]. SPEECH COMMUNICATION, 2006, 48 (02) : 220 - 231
  • [7] Single Channel Speech Enhancement for Mixed Non-stationary Noise Environments
    Singh, Sachin
    Tripathy, Manoj
    Anand, R. S.
    [J]. ADVANCES IN SIGNAL PROCESSING AND INTELLIGENT RECOGNITION SYSTEMS, 2014, 264 : 545 - 555
  • [8] A stochastic estimation of non-stationary sound signals based on elimination of background noise through vibration measurement
    Ohta, M
    Nishimura, K
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2000, E83A (01) : 158 - 161
  • [9] Robust Estimation of Non-Stationary Noise Power Spectrum for Speech Enhancement
    Mai, Van-Khanh
    Pastor, Dominique
    Aissa-El-Bey, Abdeldjalil
    Le-Bidan, Raphael
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (04) : 670 - 682
  • [10] Sparse Hidden Markov Models for Speech Enhancement in Non-Stationary Noise Environments
    Deng, Feng
    Bao, Changchun
    Kleijn, W. Bastiaan
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (11) : 1973 - 1987