Comparison of Acoustic and Visual Voice Activity Detection for Noisy Speech Recognition

被引:0
|
作者
Bratoszewski, Piotr [1 ]
Szwoch, Grzegorz [1 ]
Czyzewski, Andrzej [1 ]
机构
[1] Gdansk Univ Technol, Multimedia Syst Dept, Fac Elect Telecommun & Informat, Gdansk, Poland
关键词
voice activity detection; automatic speech recognition; visual speech recognition;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The problem of accurate differentiating between the speaker utterance and the noise parts in a speech signal is considered. The influence of utilizing a voice activity detection in speech signals on the accuracy of the automatic speech recognition (ASR) system is presented. The examined methods of voice activity detection are based on acoustic and visual modalities. The problem of detecting the voice activity in clean and noisy speech is considered. The speech signal was recorded in a real-life scenario in an office-like environment with the babble noise generated by the loudspeakers at different levels. The proposed method of visual voice activity detection is aimed at enhancing the accuracy of ASR when the ratio of signal to noise is low. The numerals in English language are used as speech material and Word Error Rate (WER) is employed for the evaluation purposes.
引用
收藏
页码:287 / 291
页数:5
相关论文
共 50 条
  • [31] Statistical Model-Based Voice Activity Detection Using Spatial Cues and Log Energy for Dual-Channel Noisy Speech Recognition
    Park, Ji Hun
    Shin, Min Hwa
    Kim, Hong Kook
    COMMUNICATION AND NETWORKING, PT II, 2010, 120 : 172 - +
  • [32] Multi-stream acoustic model adaptation for noisy speech recognition
    Tamura, Satoshi
    Hayamizu, Satoru
    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
  • [33] Voice Activity Detection with Decision Trees in Noisy Environments
    Hu Da-li
    Yi Liangzhong
    Pei Zheng
    Luo Bing
    MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION IV, PTS 1 AND 2, 2012, 128-129 : 749 - +
  • [34] HYPERARTICULATION DETECTION IN REPETITIVE VOICE QUERIES USING PAIRWISE COMPARISON FOR IMPROVED SPEECH RECOGNITION
    Kulkarni, Ranjitha Gurunath
    El Kholy, Ahmed
    Al Bawab, Ziad
    Alon, Noha
    Zitouni, Imed
    Ozertem, Umut
    Chang, Shuangyu
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4985 - 4989
  • [36] Enhanced Multichannel Histogram Equalization for Speech Recognition in noisy acoustic conditions
    Principi, Emanuele
    Rotili, Rudy
    Squartini, Stefano
    NEURAL NETS WIRN11, 2011, 234 : 149 - 161
  • [37] Speech Enhancement Based on Masking Approach Considering Speech Quality and Acoustic Confidence for Noisy Speech Recognition
    Chu, Shih-Chuan
    Wu, Chung-Hsien
    Lin, Yun-Wen
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 536 - 540
  • [38] Visual voice activity detection as a help for speech source separation from convolutive mixtures
    Rivet, Bertrand
    Girin, Laurent
    Jutten, Christian
    SPEECH COMMUNICATION, 2007, 49 (7-8) : 667 - 677
  • [39] Advancing Speech Recognition With No Speech Or With Noisy Speech
    Krishna, Gautam
    Tran, Co
    Carnahan, Mason
    Tewfik, Ahmed
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [40] Visual Voice Activity Detection in the Wild
    Patrona, Foteini
    Iosifidis, Alexandros
    Tefas, Anastasios
    Nikolaidis, Nikolaos
    Pitas, Ioannis
    IEEE TRANSACTIONS ON MULTIMEDIA, 2016, 18 (06) : 967 - 977