Speech Activity Detection on YouTube Using Deep Neural Networks

被引:0
|
作者
Ryant, Neville [1 ]
Liberman, Mark [1 ]
Yuan, Jiahong [1 ]
机构
[1] Linguist Data Consortium, Philadelphia, PA 19104 USA
基金
美国国家科学基金会;
关键词
speech activity detection; voice activity detection; segmentation; deep neural networks;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech activity detection (SAD) is an important first step in speech processing. Commonly used methods (e.g., frame-level classification using gaussian mixture models (GMMs)) work well under stationary noise conditions, but do not generalize well to domains such as YouTube, where videos may exhibit a diverse range of environmental conditions. One solution is to augment the conventional cepstral features with additional, hand-engineered features (e.g., spectral flux, spectral centroid, multiband spectral entropies) which are robust to changes in environment and recording condition. An alternative approach, explored here, is to learn robust features during the course of training using an appropriate architecture such as deep neural networks (DNNs). In this paper we demonstrate that a DNN with input consisting of multiple frames of mel frequency cepstral coefficients (MFCCs) yields drastically lower frame-wise error rates (19.6%) on YouTube videos compared to a conventional GMM based system (40%).
引用
下载
收藏
页码:728 / 731
页数:4
相关论文
共 50 条
  • [1] Speech Activity Detection Using Deep Neural Networks
    Shahsavari, Sajad
    Sameti, Hossein
    Hadian, Hossein
    2017 25TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2017, : 1564 - 1568
  • [2] Using Voice Activity Detection and Deep Neural Networks with Hybrid Speech Feature Extraction for Deceptive Speech Detection
    Mihalache, Serban
    Burileanu, Dragos
    SENSORS, 2022, 22 (03)
  • [3] Enhanced speech emotion detection using deep neural networks
    S. Lalitha
    Shikha Tripathi
    Deepa Gupta
    International Journal of Speech Technology, 2019, 22 : 497 - 510
  • [4] Enhanced speech emotion detection using deep neural networks
    Lalitha, S.
    Tripathi, Shikha
    Gupta, Deepa
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (03) : 497 - 510
  • [5] Combining Speech Features for Aggression Detection Using Deep Neural Networks
    Jaafar, Noussaiba
    Lachiri, Zied
    2020 5TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP'2020), 2020,
  • [6] Arabic Hate Speech Detection Using Deep Recurrent Neural Networks
    Al Anezi, Faisal Yousif
    APPLIED SCIENCES-BASEL, 2022, 12 (12):
  • [7] Study on the Use of Deep Neural Networks for Speech Activity Detection in Broadcast Recordings
    Mateju, Lukas
    Cerva, Petr
    Zdansky, Jindrich
    SIGMAP: PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON E-BUSINESS AND TELECOMMUNICATIONS - VOL. 5, 2016, : 45 - 51
  • [8] Deep Neural Networks for YouTube Recommendations
    Covington, Paul
    Adams, Jay
    Sargin, Emre
    PROCEEDINGS OF THE 10TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS'16), 2016, : 191 - 198
  • [9] Speech watermarking using Deep Neural Networks
    Pavlovic, Kosta
    Kovacevic, Slavko
    Durovic, Igor
    2020 28TH TELECOMMUNICATIONS FORUM (TELFOR), 2020, : 292 - 295
  • [10] Automatic Hate Speech Detection Using Deep Neural Networks and Word Embedding
    Ebenezer Ojo, Olumide
    Ta, Thang-Hoang
    Gelbukh, Alexander
    Calvo, Hiram
    Sidorov, Grigori
    Oluwayemisi Adebanji, Olaronke
    COMPUTACION Y SISTEMAS, 2022, 26 (02): : 1007 - 1013