Research on Voice Activity Detection Methods Based on Deep Learning

被引:0
|
作者
Bai, Ke [1 ]
Yan, Huaicheng [1 ]
Li, Hao [1 ]
Tang, Nanxi [1 ]
Sun, Jiazheng [1 ]
Li, Zhichen [1 ]
机构
[1] East China Univ Sci & Technol, Key Lab Smart Mfg Energy Chem Proc, Minist Educ, Shanghai 200237, Peoples R China
关键词
Voice Activity Detection; Convolutional Neural Network; Long Short-Term Memory network; Attention Mechanism; ALGORITHM;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Voice Activity Detection (VAD), as a crucial component of the speech processing, distinguishes between speech and non-speech segments within a voice. By accurately identifying moments of speech, it enhances the efficiency and performance of speech processing, reducing the wastage of resources on non-speech segments. This paper introduces a deep learning-based end-to-end trained VAD model that ingests Log-Mel features and combines Convolutional Neural Networks (CNN) with Bidirectional Long Short-Term Memory networks (BiLSTM), incorporating an attention mechanism to refine the selection and extraction of speech features. We compared three baseline models proposed on the AVA-Speech dataset and validated the enhancement in model performance due to the chosen sequence data processing network and the integration of the attention module through ablation studies. Results on the AVA-Speech dataset demonstrate that our method achieves an ACC of 90% and an AUC of 0.9439, outperforming other models and effectively fulfilling the target task.
引用
收藏
页码:1323 / 1328
页数:6
相关论文
共 50 条
  • [21] A review of lane detection methods based on deep learning
    Tang, Jigang
    Li, Songbin
    Liu, Peng
    PATTERN RECOGNITION, 2021, 111
  • [22] Temporal Action Detection Methods Based on Deep Learning
    Shen, Junyi
    Ma, Li
    Zhang, Jikai
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (03)
  • [23] Experimental Evaluation of Deep Learning Methods for an Intelligent Pathological Voice Detection System Using the Saarbruecken Voice Database
    Lee, Ji-Yeoun
    APPLIED SCIENCES-BASEL, 2021, 11 (15):
  • [24] Deep Neural Networks for Voice Activity Detection
    Mihalache, Serban
    Ivanov, Ioan-Alexandru
    Burileanu, Dragos
    2021 44TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2021, : 191 - 194
  • [25] Research on Methods of Expressway Vehicle Detection under Abnormal Weather Conditions Based on Deep Learning
    Cao, Rong
    Ma, Xiaogang
    Chen, Xuehui
    Ma, Xinyi
    Hua, Liru
    Zhao, Chihang
    Ma, Teng
    Wang, Xinliang
    2023 7TH INTERNATIONAL CONFERENCE ON ROBOTICS, CONTROL AND AUTOMATION, ICRCA, 2023, : 26 - 30
  • [26] Research on Deep Learning and Other Methods Based on Abnormal Traffic Detection in Complex Network Environment
    Wei, Guanglu
    PROCEEDINGS OF 2020 IEEE 2ND INTERNATIONAL CONFERENCE ON CIVIL AVIATION SAFETY AND INFORMATION TECHNOLOGY (ICCASIT), 2020, : 442 - 447
  • [27] Research on Liveness Detection Algorithms Based on Deep Learning
    Fan, Ying
    Shi, Yilin
    Wang, Xianliang
    Yi, Haiyang
    PROCEEDINGS OF 2019 IEEE 10TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS 2019), 2019, : 366 - 371
  • [28] Research on Fatigue Detection Method Based on Deep Learning
    Yuan, Yasheng
    Dai, Fengzhi
    An, Lingran
    Yin, Di
    Zhu, Yuxuan
    Yan, Yujie
    PROCEEDINGS OF THE 2020 INTERNATIONAL CONFERENCE ON ARTIFICIAL LIFE AND ROBOTICS (ICAROB2020), 2020, : 644 - 647
  • [29] Research on Face Detection Method Based on Deep Learning
    Sun, Xiaojie
    2020 INTERNATIONAL CONFERENCE ON BIG DATA & ARTIFICIAL INTELLIGENCE & SOFTWARE ENGINEERING (ICBASE 2020), 2020, : 200 - 203
  • [30] Deep-Learning-Based Research on Refractive Detection
    Ding, Shangshang
    Zheng, Tianli
    Yao, Kang
    Zhang, Hetong
    Pei, Ronghao
    Fu, Weiwei
    Computer Engineering and Applications, 2024, 59 (03) : 193 - 201