Computationally-efficient voice activity detection based on deep neural networks

被引:1
|
作者
Xiong, Yan [1 ]
Berisha, Visar [1 ]
Chakrabarti, Chaitali [1 ]
机构
[1] Arizona State Univ, Sch Elect Comp & Energy Engn, Tempe, AZ 85281 USA
关键词
voice activity detection; deep neural network; capsule network; low-power architecture;
D O I
10.1109/SiPS52927.2021.00020
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Voice activity detection (VAD) is among the first preprocessing steps in most speech processing applications. While there are several very low-power analog solutions, the more recent deep neural network (DNN) based solutions have superior VAD performance in even complex noisy backgrounds at the expense of increase in computations. In this paper, we propose a computationally-efficient network architecture, ResCap+, for high performance VAD. ResCap+ operates on small-sized sequences and is built with residual blocks in a convolutional neural network to encode the characteristics of the input spectrum, and a capsule network with LSTM cells to capture the temporal relationship between these sequences. We evaluate the model using the AMI meeting corpus and show that it outperforms a state-of-the-art DNN-based model on accuracy with approximate to 55 x less computation cost. We also present initial hardware performance results on a low-power programmable architecture, Transmuter, and show that it can process every 40ms input audio sequence with a delay of 15.17ms resulting in real-time performance.
引用
收藏
页码:64 / 69
页数:6
相关论文
共 50 条
  • [1] DENOISING DEEP NEURAL NETWORKS BASED VOICE ACTIVITY DETECTION
    Zhang, Xiao-Lei
    Wu, Ji
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 853 - 857
  • [2] Voice activity detection based on deep neural networks and Viterbi
    Bai, Liang
    Zhang, Zhen
    Hu, Jun
    2017 2ND INTERNATIONAL SEMINAR ON ADVANCES IN MATERIALS SCIENCE AND ENGINEERING, 2017, 231
  • [3] Deep Neural Networks for Voice Activity Detection
    Mihalache, Serban
    Ivanov, Ioan-Alexandru
    Burileanu, Dragos
    2021 44TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2021, : 191 - 194
  • [4] ROBUST AND COMPUTATIONALLY-EFFICIENT ANOMALY DETECTION USING POWERS-OF-TWO NETWORKS
    Muneeb, Usama
    Koyuncu, Erdem
    Keshtkarjahromi, Yasaman
    Seferoglu, Hulya
    Erdent, Mehmet Fatih
    Cetin, A. Enis
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 2992 - 2996
  • [5] Computationally-Efficient Neural Image Compression with Shallow Decoders
    Yang, Yibo
    Mandt, Stephan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 530 - 540
  • [6] A Comparison of Boosted Deep Neural Networks for Voice Activity Detection
    Krishnakumar, Harshit
    Williamson, Donald S.
    2019 7TH IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (IEEE GLOBALSIP), 2019,
  • [7] Compact and Computationally Efficient Representation of Deep Neural Networks
    Wiedemann, Simon
    Mueller, Klaus-Robert
    Samek, Wojciech
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (03) : 772 - 785
  • [8] Robust Computationally-Efficient Wireless Emitter Classification Using Autoencoders and Convolutional Neural Networks
    Almazrouei, Ebtesam
    Gianini, Gabriele
    Almoosa, Nawaf
    Damiani, Ernesto
    SENSORS, 2021, 21 (07)
  • [9] Deep Belief Networks Based Voice Activity Detection
    Zhang, Xiao-Lei
    Wu, Ji
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (04): : 697 - 710
  • [10] Deep Neural Networks for joint Voice Activity Detection and Speaker Localization
    Vecchiotti, Paolo
    Principi, Emanuele
    Squartini, Stefano
    Piazza, Francesco
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 1567 - 1571