Research on Voice Activity Detection Methods Based on Deep Learning

被引:0
|
作者
Bai, Ke [1 ]
Yan, Huaicheng [1 ]
Li, Hao [1 ]
Tang, Nanxi [1 ]
Sun, Jiazheng [1 ]
Li, Zhichen [1 ]
机构
[1] East China Univ Sci & Technol, Key Lab Smart Mfg Energy Chem Proc, Minist Educ, Shanghai 200237, Peoples R China
关键词
Voice Activity Detection; Convolutional Neural Network; Long Short-Term Memory network; Attention Mechanism; ALGORITHM;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Voice Activity Detection (VAD), as a crucial component of the speech processing, distinguishes between speech and non-speech segments within a voice. By accurately identifying moments of speech, it enhances the efficiency and performance of speech processing, reducing the wastage of resources on non-speech segments. This paper introduces a deep learning-based end-to-end trained VAD model that ingests Log-Mel features and combines Convolutional Neural Networks (CNN) with Bidirectional Long Short-Term Memory networks (BiLSTM), incorporating an attention mechanism to refine the selection and extraction of speech features. We compared three baseline models proposed on the AVA-Speech dataset and validated the enhancement in model performance due to the chosen sequence data processing network and the integration of the attention module through ablation studies. Results on the AVA-Speech dataset demonstrate that our method achieves an ACC of 90% and an AUC of 0.9439, outperforming other models and effectively fulfilling the target task.
引用
收藏
页码:1323 / 1328
页数:6
相关论文
共 50 条
  • [31] Research on Object Detection and Tracking based on Deep Learning
    Pan, Jing
    2024 4TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND INTELLIGENT SYSTEMS ENGINEERING, MLISE 2024, 2024, : 236 - 239
  • [32] Research on Pedestrian Detection Algorithm Based on Deep Learning
    Wang, Ying
    Tian, Ying
    IAENG International Journal of Computer Science, 2023, 50 (04)
  • [33] Research on Intrusion Detection Technology Based on Deep Learning
    Ding, Shan
    Wang, Genying
    PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 1474 - 1478
  • [34] Voice activity detection based on statistical models and machine learning approaches
    Shin, Jong Won
    Chang, Joon-Hyuk
    Kim, Nam Soo
    COMPUTER SPEECH AND LANGUAGE, 2010, 24 (03): : 515 - 530
  • [35] Decoding communication: a deep learning approach to voice-based intention detection
    Franti, Eduard
    Dascalu, Monica
    Ispas, Ioan
    Tebeanu, Ana Voichita
    Elteto, Zoltan
    Branea, Silvia
    Dragomir, Voichita
    ROMANIAN JOURNAL OF INFORMATION SCIENCE AND TECHNOLOGY, 2018, 21 (04): : 460 - 474
  • [36] An Efficient SMOTE-Based Deep Learning Model for Voice Pathology Detection
    Lee, Ji-Na
    Lee, Ji-Yeoun
    APPLIED SCIENCES-BASEL, 2023, 13 (06):
  • [37] Forensic detection of heterogeneous activity in data using deep learning methods
    Nyarko, Benedicta Nana Esi
    Bin, Wu
    Zhou, Jinzhi
    Odoom, Justice
    Danso, Samuel Akwasi
    Addai, Gyarteng Emmanuel Sarpong
    INTELLIGENT SYSTEMS WITH APPLICATIONS, 2024, 21
  • [38] Voice Activity Detection based on Statistical Model Employing Deep Neural Network
    Hwang, Inyoung
    Chang, Joon-Hyuk
    2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014), 2014, : 582 - 585
  • [39] UNSUPERVISED DOMAIN ADAPTATION FOR DEEP NEURAL NETWORK BASED VOICE ACTIVITY DETECTION
    Zhang, Xiao-Lei
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [40] Voice activity detection based on deep belief networks using likelihood ratio
    Kim, Sang-Kyun
    Park, Young-Jin
    Lee, Sangmin
    JOURNAL OF CENTRAL SOUTH UNIVERSITY, 2016, 23 (01) : 145 - 149