Computationally-efficient voice activity detection based on deep neural networks

被引:1
|
作者
Xiong, Yan [1 ]
Berisha, Visar [1 ]
Chakrabarti, Chaitali [1 ]
机构
[1] Arizona State Univ, Sch Elect Comp & Energy Engn, Tempe, AZ 85281 USA
关键词
voice activity detection; deep neural network; capsule network; low-power architecture;
D O I
10.1109/SiPS52927.2021.00020
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Voice activity detection (VAD) is among the first preprocessing steps in most speech processing applications. While there are several very low-power analog solutions, the more recent deep neural network (DNN) based solutions have superior VAD performance in even complex noisy backgrounds at the expense of increase in computations. In this paper, we propose a computationally-efficient network architecture, ResCap+, for high performance VAD. ResCap+ operates on small-sized sequences and is built with residual blocks in a convolutional neural network to encode the characteristics of the input spectrum, and a capsule network with LSTM cells to capture the temporal relationship between these sequences. We evaluate the model using the AMI meeting corpus and show that it outperforms a state-of-the-art DNN-based model on accuracy with approximate to 55 x less computation cost. We also present initial hardware performance results on a low-power programmable architecture, Transmuter, and show that it can process every 40ms input audio sequence with a delay of 15.17ms resulting in real-time performance.
引用
收藏
页码:64 / 69
页数:6
相关论文
共 50 条
  • [31] Voice activity detection based on deep belief networks using likelihood ratio
    Sang-Kyun Kim
    Young-Jin Park
    Sangmin Lee
    Journal of Central South University, 2016, 23 : 145 - 149
  • [32] Voice activity detection based on deep belief networks using likelihood ratio
    KIM Sang-Kyun
    PARK Young-Jin
    LEE Sangmin
    Journal of Central South University, 2016, 23 (01) : 145 - 149
  • [33] Deep Confidence: A Computationally Efficient Framework for Calculating Reliable Prediction Errors for Deep Neural Networks
    Cortes-Ciriano, Isidro
    Bender, Andreas
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2019, 59 (03) : 1269 - 1281
  • [34] Computationally-efficient password authenticated key exchange based on quadratic residues
    Zhang, Muxiang
    PROGRESS IN CRYPTOLOGY - INDOCRYPT 2007, 2007, 4859 : 312 - 321
  • [35] Singing voice synthesis based on deep neural networks
    Nishimura, Masanari
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2478 - 2482
  • [36] Comparative study of singing voice detection based on deep neural networks and ensemble learning
    You, Shingchern D.
    Liu, Chien-Hung
    Chen, Woei-Kae
    HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES, 2018, 8
  • [37] Deep Neural Networks for Multi-Room Voice Activity Detection: Advancements and Comparative Evaluation
    Vesperini, Fabio
    Vecchiotti, Paolo
    Principi, Emanuele
    Squartini, Stefano
    Piazza, Francesco
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 3391 - 3398
  • [38] Ensemble of deep neural networks using acoustic environment classification for statistical model-based voice activity detection
    Hwang, Inyoung
    Park, Hyung-Min
    Chang, Joon-Hyuk
    COMPUTER SPEECH AND LANGUAGE, 2016, 38 : 1 - 12
  • [39] A computationally-efficient method for modelling the transient consolidation behavior of saturated compressive particulate networks
    Hammerich, Simon
    Gleiss, Marco
    Stickland, Anthony D.
    Nirschl, Hermann
    SEPARATION AND PURIFICATION TECHNOLOGY, 2019, 220 : 222 - 230
  • [40] Using Voice Activity Detection and Deep Neural Networks with Hybrid Speech Feature Extraction for Deceptive Speech Detection
    Mihalache, Serban
    Burileanu, Dragos
    SENSORS, 2022, 22 (03)