Comparison of Acoustic and Visual Voice Activity Detection for Noisy Speech Recognition

被引:0
|
作者
Bratoszewski, Piotr [1 ]
Szwoch, Grzegorz [1 ]
Czyzewski, Andrzej [1 ]
机构
[1] Gdansk Univ Technol, Multimedia Syst Dept, Fac Elect Telecommun & Informat, Gdansk, Poland
关键词
voice activity detection; automatic speech recognition; visual speech recognition;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The problem of accurate differentiating between the speaker utterance and the noise parts in a speech signal is considered. The influence of utilizing a voice activity detection in speech signals on the accuracy of the automatic speech recognition (ASR) system is presented. The examined methods of voice activity detection are based on acoustic and visual modalities. The problem of detecting the voice activity in clean and noisy speech is considered. The speech signal was recorded in a real-life scenario in an office-like environment with the babble noise generated by the loudspeakers at different levels. The proposed method of visual voice activity detection is aimed at enhancing the accuracy of ASR when the ratio of signal to noise is low. The numerals in English language are used as speech material and Word Error Rate (WER) is employed for the evaluation purposes.
引用
收藏
页码:287 / 291
页数:5
相关论文
共 50 条
  • [21] Combining Speech Energy and Edge Information for Fast and Efficient Voice Activity Detection in Noisy Environments
    Li, Xiaokun
    Deng, Yunbin
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 2511 - 2514
  • [22] DNN-BASED VOICE ACTIVITY DETECTION USING AUXILIARY SPEECH MODELS IN NOISY ENVIRONMENTS
    Tachioka, Yuuki
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5529 - 5533
  • [23] Comparison of Voice Activity Detectors for Interview Speech in NIST Speaker Recognition Evaluation
    Yu, Hon-Bill
    Mak, Man-Wai
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2364 - +
  • [24] Complete-Linkage Clustering for Voice Activity Detection in Audio and Visual Speech
    Ghaemmaghami, Houman
    Dean, David
    Kalantari, Shahram
    Sridharan, Sridha
    Fookes, Clinton
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2292 - 2296
  • [25] Speech recognition enhancement with statistical model-based voice activity detection
    Jarc, Bojan
    Babič, Rudolf
    Elektrotehniski Vestnik/Electrotechnical Review, 2002, 69 (01): : 75 - 81
  • [26] Evaluating the Impact of Voice Activity Detection on Speech Emotion Recognition for Autistic Children
    Milling, Manuel
    Baird, Alice
    Bartl-Pokorny, Katrin D.
    Liu, Shuo
    Alcorn, Alyssa M.
    Shen, Jie
    Tavassoli, Teresa
    Ainger, Eloise
    Pellicano, Elizabeth
    Pantic, Maja
    Cummins, Nicholas
    Schuller, Bjoern W.
    FRONTIERS IN COMPUTER SCIENCE, 2022, 4
  • [27] SPEECH RECOGNITION WITH NO SPEECH OR WITH NOISY SPEECH
    Krishna, Gautam
    Co Tran
    Yu, Jianguo
    Tewfik, Ahmed H.
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 1090 - 1094
  • [28] Audio-Visual Speech Recognition in Noisy Audio Environments
    Palecek, Karel
    Chaloupka, Josef
    2013 36TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2013, : 484 - 487
  • [29] Endpoint detection method of noisy Chinese speech recognition
    Wang, Peng
    Ta, Weina
    Chen, Shuzhong
    Jisuanji Gongcheng/Computer Engineering, 2003, 29 (17):
  • [30] Fuzzy Neural Network with Audio-Visual Data for Voice Activity Detection in Noisy Environments
    Wu, Gin-Der
    Zhu, Zhen-Wei
    2018 INTERNATIONAL CONFERENCE ON INTELLIGENT AUTONOMOUS SYSTEMS (ICOIAS), 2018, : 141 - 145