Research on Voice Activity Detection Methods Based on Deep Learning

被引:0
|
作者
Bai, Ke [1 ]
Yan, Huaicheng [1 ]
Li, Hao [1 ]
Tang, Nanxi [1 ]
Sun, Jiazheng [1 ]
Li, Zhichen [1 ]
机构
[1] East China Univ Sci & Technol, Key Lab Smart Mfg Energy Chem Proc, Minist Educ, Shanghai 200237, Peoples R China
关键词
Voice Activity Detection; Convolutional Neural Network; Long Short-Term Memory network; Attention Mechanism; ALGORITHM;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Voice Activity Detection (VAD), as a crucial component of the speech processing, distinguishes between speech and non-speech segments within a voice. By accurately identifying moments of speech, it enhances the efficiency and performance of speech processing, reducing the wastage of resources on non-speech segments. This paper introduces a deep learning-based end-to-end trained VAD model that ingests Log-Mel features and combines Convolutional Neural Networks (CNN) with Bidirectional Long Short-Term Memory networks (BiLSTM), incorporating an attention mechanism to refine the selection and extraction of speech features. We compared three baseline models proposed on the AVA-Speech dataset and validated the enhancement in model performance due to the chosen sequence data processing network and the integration of the attention module through ablation studies. Results on the AVA-Speech dataset demonstrate that our method achieves an ACC of 90% and an AUC of 0.9439, outperforming other models and effectively fulfilling the target task.
引用
收藏
页码:1323 / 1328
页数:6
相关论文
共 50 条
  • [1] AUC OPTIMIZATION FOR DEEP LEARNING BASED VOICE ACTIVITY DETECTION
    Fan, Zi-Chen
    Bai, Zhongxin
    Zhang, Xiao-Lei
    Rahardja, Susanto
    Chen, Jingdong
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6760 - 6764
  • [2] Deep Learning Approaches for Voice Activity Detection
    Wang, Mantao
    Huang, Qiang
    Zhang, Jie
    Li, Zhiyong
    Pu, Haibo
    Lei, Jinglan
    Wang, Lanjing
    CYBER SECURITY INTELLIGENCE AND ANALYTICS, 2020, 928 : 816 - 826
  • [3] AUC optimization for deep learning-based voice activity detection
    Xiao-Lei Zhang
    Menglong Xu
    EURASIP Journal on Audio, Speech, and Music Processing, 2022
  • [4] AUC optimization for deep learning-based voice activity detection
    Zhang, Xiao-Lei
    Xu, Menglong
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2022, 2022 (01)
  • [5] Overview of Voice Conversion Methods Based on Deep Learning
    Walczyna, Tomasz
    Piotrowski, Zbigniew
    APPLIED SCIENCES-BASEL, 2023, 13 (05):
  • [6] Research on grading detection methods for diabetic retinopathy based on deep learning
    Zhang, Jing
    Chen, Juan
    PAKISTAN JOURNAL OF MEDICAL SCIENCES, 2025, 41 (01) : 225 - 229
  • [7] Deep Belief Networks Based Voice Activity Detection
    Zhang, Xiao-Lei
    Wu, Ji
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (04): : 697 - 710
  • [8] Research Progress of Vision Detection Methods Based on Deep Learning for Transmission Lines
    Liu C.
    Wu Y.
    Zhongguo Dianji Gongcheng Xuebao/Proceedings of the Chinese Society of Electrical Engineering, 2023, 43 (19): : 7423 - 7445
  • [9] Research status of deep learning methods for rumor detection
    Tan, Li
    Wang, Ge
    Jia, Feiyang
    Lian, Xiaofeng
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (02) : 2941 - 2982
  • [10] Research status of deep learning methods for rumor detection
    Li Tan
    Ge Wang
    Feiyang Jia
    Xiaofeng Lian
    Multimedia Tools and Applications, 2023, 82 : 2941 - 2982