Speech Activity Detection on YouTube Using Deep Neural Networks

被引:0
|
作者
Ryant, Neville [1 ]
Liberman, Mark [1 ]
Yuan, Jiahong [1 ]
机构
[1] Linguist Data Consortium, Philadelphia, PA 19104 USA
基金
美国国家科学基金会;
关键词
speech activity detection; voice activity detection; segmentation; deep neural networks;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech activity detection (SAD) is an important first step in speech processing. Commonly used methods (e.g., frame-level classification using gaussian mixture models (GMMs)) work well under stationary noise conditions, but do not generalize well to domains such as YouTube, where videos may exhibit a diverse range of environmental conditions. One solution is to augment the conventional cepstral features with additional, hand-engineered features (e.g., spectral flux, spectral centroid, multiband spectral entropies) which are robust to changes in environment and recording condition. An alternative approach, explored here, is to learn robust features during the course of training using an appropriate architecture such as deep neural networks (DNNs). In this paper we demonstrate that a DNN with input consisting of multiple frames of mel frequency cepstral coefficients (MFCCs) yields drastically lower frame-wise error rates (19.6%) on YouTube videos compared to a conventional GMM based system (40%).
引用
下载
收藏
页码:728 / 731
页数:4
相关论文
共 50 条
  • [21] Human Activity Detection via WiFi Signals Using Deep Neural Networks
    Lee, Chien-Cheng
    Huang, Xiu-Chi
    2018 IEEE/ACM INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING COMPANION (UCC COMPANION), 2018, : 3 - 4
  • [22] Stress detection using deep neural networks
    Li, Russell
    Liu, Zhandong
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2020, 20 (Suppl 11)
  • [23] Object Detection Using Deep Neural Networks
    Shah, Malay
    Kapdi, Rupal
    2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS), 2017, : 787 - 790
  • [24] Stress detection using deep neural networks
    Russell Li
    Zhandong Liu
    BMC Medical Informatics and Decision Making, 20
  • [25] Cough Detection Using Deep Neural Networks
    Liu, Jia-Ming
    You, Mingyu
    Wang, Zheng
    Li, Guo-Zheng
    Xu, Xianghuai
    Qiu, Zhongmin
    2014 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2014,
  • [26] Monkeypox detection using deep neural networks
    Sorayaie Azar, Amir
    Naemi, Amin
    Babaei Rikan, Samin
    Mohasefi, Jamshid Bagherzadeh
    Pirnejad, Habibollah
    Wiil, Uffe Kock
    BMC INFECTIOUS DISEASES, 2023, 23 (01)
  • [27] Monkeypox detection using deep neural networks
    Amir Sorayaie Azar
    Amin Naemi
    Samin Babaei Rikan
    Jamshid Bagherzadeh Mohasefi
    Habibollah Pirnejad
    Uffe Kock Wiil
    BMC Infectious Diseases, 23
  • [28] A Low-Power Speech Recognizer and Voice Activity Detector Using Deep Neural Networks
    Price, Michael
    Glass, James
    Chandrakasan, Anantha P.
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2018, 53 (01) : 66 - 75
  • [29] Deep Neural Networks with Batch Speaker Normalization for Intoxicated Speech Detection
    Wang, Weiqing
    Wu, Haiwei
    Li, Ming
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1323 - 1327
  • [30] Exploiting deep neural networks for detection-based speech recognition
    Siniscalchi, Sabato Marco
    Yu, Dong
    Deng, Li
    Lee, Chin-Hui
    NEUROCOMPUTING, 2013, 106 : 148 - 157