Comparison of Acoustic and Visual Voice Activity Detection for Noisy Speech Recognition

被引:0
|
作者
Bratoszewski, Piotr [1 ]
Szwoch, Grzegorz [1 ]
Czyzewski, Andrzej [1 ]
机构
[1] Gdansk Univ Technol, Multimedia Syst Dept, Fac Elect Telecommun & Informat, Gdansk, Poland
关键词
voice activity detection; automatic speech recognition; visual speech recognition;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The problem of accurate differentiating between the speaker utterance and the noise parts in a speech signal is considered. The influence of utilizing a voice activity detection in speech signals on the accuracy of the automatic speech recognition (ASR) system is presented. The examined methods of voice activity detection are based on acoustic and visual modalities. The problem of detecting the voice activity in clean and noisy speech is considered. The speech signal was recorded in a real-life scenario in an office-like environment with the babble noise generated by the loudspeakers at different levels. The proposed method of visual voice activity detection is aimed at enhancing the accuracy of ASR when the ratio of signal to noise is low. The numerals in English language are used as speech material and Word Error Rate (WER) is employed for the evaluation purposes.
引用
收藏
页码:287 / 291
页数:5
相关论文
共 50 条
  • [1] Voice Activity Detection for Children's Read Speech Recognition in Noisy Conditions
    Pasad, Ankita
    Sabu, Kamini
    Rao, Preeti
    2017 TWENTY-THIRD NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2017,
  • [2] Visual Voice Activity Detection and Adaptive Threshold Estimation for Speech Recognition
    Song, Taeyup
    Lee, Kyungsun
    Kim, Sung Soo
    Lee, Jae-Won
    Ko, Hanseok
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2015, 34 (04): : 321 - 327
  • [3] Robust Voice Activity Detection Algorithm for Noisy Speech
    Verteletskaya, Ekaterina
    Simak, Boris
    RTT 2009: 11TH INTERNATIONAL CONFERENCE RTT 2009 RESEARCH IN TELECOMMUNICATION TECHNOLOGY, CONFERENCE PROCEEDINGS, 2009, : 98 - 101
  • [4] An Improvement in Audio-Visual Voice Activity Detection for Automatic Speech Recognition
    Yoshida, Takami
    Nakadai, Kazuhiro
    Okuno, Hiroshi G.
    TRENDS IN APPLIED INTELLIGENT SYSTEMS, PT I, PROCEEDINGS, 2010, 6096 : 51 - +
  • [5] A Robust Audio-visual Speech Recognition Using Audio-visual Voice Activity Detection
    Tamura, Satoshi
    Ishikawa, Masato
    Hashiba, Takashi
    Takeuchi, Shin'ichi
    Hayamizu, Satoru
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2702 - +
  • [6] Bispectrum estimators for voice activity detection and speech recognition
    Górriz, JM
    Puntonet, CG
    Ramírez, J
    Segura, JC
    NONLINEAR ANALYSES AND ALGORITHMS FOR SPEECH PROCESSING, 2005, 3817 : 174 - 185
  • [7] Voice Activity Detection Method Using Psycho Acoustic Model Based on Speech Energy Maximization in Noisy Environments
    Choi, Gab-Keun
    Kim, Soon-Hyob
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2009, 28 (05): : 447 - 453
  • [8] Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment
    Babu, C. Ganesh
    Vanathi, P. T.
    Ramachandran, R.
    Rajaa, M. Senthil
    Vengatesh, R.
    JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH, 2010, 69 (07): : 515 - 522
  • [9] Harmonic-Based Robust Voice Activity Detection for Enhanced Low SNR Noisy Speech Recognition System
    Shih, Po-Yi
    Lin, Po-Chuan
    Wang, Jhing-Fa
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2016, E99A (11) : 1928 - 1936
  • [10] An analysis of visual speech information applied to voice activity detection
    Sodoyer, David
    Rivet, Bertrand
    Girin, Laurent
    Schwartz, Jean-Luc
    Jutten, Christian
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 601 - 604