Computationally-efficient voice activity detection based on deep neural networks

被引:1
|
作者
Xiong, Yan [1 ]
Berisha, Visar [1 ]
Chakrabarti, Chaitali [1 ]
机构
[1] Arizona State Univ, Sch Elect Comp & Energy Engn, Tempe, AZ 85281 USA
关键词
voice activity detection; deep neural network; capsule network; low-power architecture;
D O I
10.1109/SiPS52927.2021.00020
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Voice activity detection (VAD) is among the first preprocessing steps in most speech processing applications. While there are several very low-power analog solutions, the more recent deep neural network (DNN) based solutions have superior VAD performance in even complex noisy backgrounds at the expense of increase in computations. In this paper, we propose a computationally-efficient network architecture, ResCap+, for high performance VAD. ResCap+ operates on small-sized sequences and is built with residual blocks in a convolutional neural network to encode the characteristics of the input spectrum, and a capsule network with LSTM cells to capture the temporal relationship between these sequences. We evaluate the model using the AMI meeting corpus and show that it outperforms a state-of-the-art DNN-based model on accuracy with approximate to 55 x less computation cost. We also present initial hardware performance results on a low-power programmable architecture, Transmuter, and show that it can process every 40ms input audio sequence with a delay of 15.17ms resulting in real-time performance.
引用
收藏
页码:64 / 69
页数:6
相关论文
共 50 条
  • [21] Detection of Glottic Neoplasm Based on Voice Signals Using Deep Neural Networks
    Wang, Chi-Te
    Chuang, Zong-Ying
    Hung, Chao-Hsiang
    Tsao, Yu
    Fang, Shih-Hau
    IEEE SENSORS LETTERS, 2022, 6 (03)
  • [22] EagerNet: Early Predictions of Neural Networks for Computationally Efficient Intrusion Detection
    Meghdouri, Fares
    Bachl, Maximilian
    Zseby, Tanja
    2020 FOURTH CYBER SECURITY IN NETWORKING CONFERENCE (CSNET), 2020,
  • [23] Voice Activity Detection based on Statistical Model Employing Deep Neural Network
    Hwang, Inyoung
    Chang, Joon-Hyuk
    2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014), 2014, : 582 - 585
  • [24] UNSUPERVISED DOMAIN ADAPTATION FOR DEEP NEURAL NETWORK BASED VOICE ACTIVITY DETECTION
    Zhang, Xiao-Lei
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [25] Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection
    Zhang, Xiao-Lei
    Wang, DeLiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (02) : 252 - 264
  • [26] An energy-efficient voice activity detector using deep neural networks and approximate computing
    Liu, Bo
    Wang, Zhen
    Guo, Shisheng
    Yu, Huazhen
    Gong, Yu
    Yang, Jun
    Shi, Longxing
    MICROELECTRONICS JOURNAL, 2019, 87 : 12 - 21
  • [27] EFFICIENT TARGET ACTIVITY DETECTION BASED ON RECURRENT NEURAL NETWORKS
    Gerber, Daniel
    Meier, Stefan
    Kellermann, Walter
    2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017), 2017, : 46 - 50
  • [28] On computationally-efficient NLMS-based algorithms for echo cancellation applications
    Abdel-Raheem, E
    2005 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), Vols 1 and 2, 2005, : 680 - 684
  • [29] Computationally Efficient Target Classification in Multispectral Image Data with Deep Neural Networks
    Cavigelli, Lukas
    Bernath, Dominic
    Magno, Michele
    Benini, Luca
    TARGET AND BACKGROUND SIGNATURES II, 2016, 9997
  • [30] Voice activity detection based on deep belief networks using likelihood ratio
    Kim, Sang-Kyun
    Park, Young-Jin
    Lee, Sangmin
    JOURNAL OF CENTRAL SOUTH UNIVERSITY, 2016, 23 (01) : 145 - 149