Visual Voice Activity Detection and Adaptive Threshold Estimation for Speech Recognition

被引:0
|
作者
Song, Taeyup
Lee, Kyungsun
Kim, Sung Soo
Lee, Jae-Won
Ko, Hanseok
机构
来源
关键词
Voice activity detection; End-point detection; Local variance histogram;
D O I
10.7776/ASK.2015.34.4.321
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose an algorithm for achieving robust Visual Voice Activity Detection (VVAD) for enhanced speech recognition. In conventional VVAD algorithms, the motion of lip region is found by applying an optical flow or Chaos inspired measures for detecting visual speech frames. The optical flow-based VVAD is difficult to be adopted to driving scenarios due to its computational complexity. While invariant to illumination changes, Chaos theory based VVAD method is sensitive to motion translations caused by driver's head movements. The proposed Local Variance Histogram (LVH) is robust to the pixel intensity changes from both illumination change and translation change. Hence, for improved performance in environmental changes, we adopt the novel threshold estimation using total variance change. In the experimental results, the proposed VVAD algorithm achieves robustness in various driving situations.
引用
收藏
页码:321 / 327
页数:7
相关论文
共 50 条
  • [1] Comparison of Acoustic and Visual Voice Activity Detection for Noisy Speech Recognition
    Bratoszewski, Piotr
    Szwoch, Grzegorz
    Czyzewski, Andrzej
    [J]. 2016 SIGNAL PROCESSING: ALGORITHMS, ARCHITECTURES, ARRANGEMENTS, AND APPLICATIONS (SPA), 2016, : 287 - 291
  • [2] An Improvement in Audio-Visual Voice Activity Detection for Automatic Speech Recognition
    Yoshida, Takami
    Nakadai, Kazuhiro
    Okuno, Hiroshi G.
    [J]. TRENDS IN APPLIED INTELLIGENT SYSTEMS, PT I, PROCEEDINGS, 2010, 6096 : 51 - +
  • [3] A Robust Audio-visual Speech Recognition Using Audio-visual Voice Activity Detection
    Tamura, Satoshi
    Ishikawa, Masato
    Hashiba, Takashi
    Takeuchi, Shin'ichi
    Hayamizu, Satoru
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2702 - +
  • [4] Bispectrum estimators for voice activity detection and speech recognition
    Górriz, JM
    Puntonet, CG
    Ramírez, J
    Segura, JC
    [J]. NONLINEAR ANALYSES AND ALGORITHMS FOR SPEECH PROCESSING, 2005, 3817 : 174 - 185
  • [5] Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold
    Davis, A
    Nordholm, S
    Togneri, R
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (02): : 412 - 424
  • [6] An analysis of visual speech information applied to voice activity detection
    Sodoyer, David
    Rivet, Bertrand
    Girin, Laurent
    Schwartz, Jean-Luc
    Jutten, Christian
    [J]. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 601 - 604
  • [7] Robust speech recognition using adaptive noise threshold estimation and wavelet shrinkage
    Pham, Tuan Vam
    Kubin, Gernot
    Rank, Erhard
    [J]. 2008 SECOND INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS, 2008, : 204 - +
  • [8] A SPEECH ENHANCEMENT SYSTEM FOR AUTOMOTIVE SPEECH RECOGNITION WITH A HYBRID VOICE ACTIVITY DETECTION METHOD
    Wang, Haikun
    Ye, Zhongfu
    Chen, Jingdong
    [J]. 2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 456 - 460
  • [9] Interference Reduction in Reverberant Speech Separation With Visual Voice Activity Detection
    Liu, Qingju
    Aubrey, Andrew J.
    Wang, Wenwu
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2014, 16 (06) : 1610 - 1623
  • [10] Two-Layered Audio-Visual Integration in Voice Activity Detection and Automatic Speech Recognition for Robots
    Yoshida, Takami
    Nakadai, Kazuhiro
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2710 - 2713