Comparison of Acoustic and Visual Voice Activity Detection for Noisy Speech Recognition

被引:0
|
作者
Bratoszewski, Piotr [1 ]
Szwoch, Grzegorz [1 ]
Czyzewski, Andrzej [1 ]
机构
[1] Gdansk Univ Technol, Multimedia Syst Dept, Fac Elect Telecommun & Informat, Gdansk, Poland
关键词
voice activity detection; automatic speech recognition; visual speech recognition;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The problem of accurate differentiating between the speaker utterance and the noise parts in a speech signal is considered. The influence of utilizing a voice activity detection in speech signals on the accuracy of the automatic speech recognition (ASR) system is presented. The examined methods of voice activity detection are based on acoustic and visual modalities. The problem of detecting the voice activity in clean and noisy speech is considered. The speech signal was recorded in a real-life scenario in an office-like environment with the babble noise generated by the loudspeakers at different levels. The proposed method of visual voice activity detection is aimed at enhancing the accuracy of ASR when the ratio of signal to noise is low. The numerals in English language are used as speech material and Word Error Rate (WER) is employed for the evaluation purposes.
引用
收藏
页码:287 / 291
页数:5
相关论文
共 50 条
  • [41] A robust endpoint detection of speech for noisy environments with application to automatic speech recognition
    Bou-Ghazale, SE
    Assaleh, K
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 3808 - 3811
  • [42] Context-Aware Neural Voice Activity Detection Using Auxiliary Networks for Phoneme Recognition, Speech Enhancement and Acoustic Scene Classification
    Masumura, Ryo
    Matsui, Kiyoaki
    Koizumi, Yuma
    Fukutomi, Takaaki
    Oba, Takanobu
    Aono, Yushi
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [43] Noisy Speech Recognition Based on Combined Audio-Visual Classifiers
    Terissi, Lucas D.
    Sad, Gonzalo D.
    Gomez, Juan C.
    Parodi, Marianela
    MULTIMODAL PATTERN RECOGNITION OF SOCIAL SIGNALS IN HUMAN-COMPUTER-INTERACTION, 2015, 8869 : 43 - 53
  • [44] A Performance Comparison of Commercial Speech Recognition APIs in Noisy Environments
    Lee G.
    Lee S.
    Ji S.
    Kim A.
    Im H.
    Transactions of the Korean Institute of Electrical Engineers, 2022, 71 (09): : 1266 - 1273
  • [45] DIRECT NOISY SPEECH MODELING FOR NOISY-TO-NOISY VOICE CONVERSION
    Xie, Chao
    Wu, Yi-Chiao
    Tobing, Patrick Lumban
    Huang, Wen-Chin
    Toda, Tomoki
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6787 - 6791
  • [46] Voice Activity Detection for Speech Enhancement Applications
    Verteletskaya, E.
    Sakhnov, K.
    ACTA POLYTECHNICA, 2010, 50 (04) : 100 - 105
  • [47] A comparison of neural-based visual recognisers for speech activity detection
    Raza S.
    Cuayáhuitl H.
    International Journal of Speech Technology, 2023, 26 (03) : 599 - 608
  • [48] LONG-TERM AUTO-CORRELATION STATISTICS BASED VOICE ACTIVITY DETECTION FOR STRONG NOISY SPEECH
    Shi, Wei
    Zou, Yuexian
    Liu, Yi
    2014 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (CHINASIP), 2014, : 100 - 104
  • [49] Speech recognition using voice-characteristic-dependent acoustic models
    Suzuki, H
    Zen, H
    Nankaku, Y
    Miyajima, C
    Tokuda, K
    Kitamura, T
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 740 - 743
  • [50] The benefits of combining acoustic and electric stimulation for the recognition of speech, voice and melodies
    Dorman, Michael F.
    Gifford, Rene H.
    Spahr, Anthony J.
    McKarns, Sharon A.
    AUDIOLOGY AND NEURO-OTOLOGY, 2008, 13 (02) : 105 - 112