Speech Activity Detection on YouTube Using Deep Neural Networks

被引：0

作者：

Ryant, Neville ^{[1
]}

Liberman, Mark ^{[1
]}

Yuan, Jiahong ^{[1
]}

机构：

[1] Linguist Data Consortium, Philadelphia, PA 19104 USA

来源：

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年

基金：

美国国家科学基金会;

关键词：

speech activity detection; voice activity detection; segmentation; deep neural networks;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech activity detection (SAD) is an important first step in speech processing. Commonly used methods (e.g., frame-level classification using gaussian mixture models (GMMs)) work well under stationary noise conditions, but do not generalize well to domains such as YouTube, where videos may exhibit a diverse range of environmental conditions. One solution is to augment the conventional cepstral features with additional, hand-engineered features (e.g., spectral flux, spectral centroid, multiband spectral entropies) which are robust to changes in environment and recording condition. An alternative approach, explored here, is to learn robust features during the course of training using an appropriate architecture such as deep neural networks (DNNs). In this paper we demonstrate that a DNN with input consisting of multiple frames of mel frequency cepstral coefficients (MFCCs) yields drastically lower frame-wise error rates (19.6%) on YouTube videos compared to a conventional GMM based system (40%).

引用

下载

页码：728 / 731

页数：4

共 50 条

[1] Speech Activity Detection Using Deep Neural Networks
Shahsavari, Sajad
Sameti, Hossein
Hadian, Hossein
2017 25TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2017, : 1564 - 1568
[2] Using Voice Activity Detection and Deep Neural Networks with Hybrid Speech Feature Extraction for Deceptive Speech Detection
Mihalache, Serban
Burileanu, Dragos
SENSORS, 2022, 22 (03)
[3] Enhanced speech emotion detection using deep neural networks
S. Lalitha
Shikha Tripathi
Deepa Gupta
International Journal of Speech Technology, 2019, 22 : 497 - 510
[4] Enhanced speech emotion detection using deep neural networks
Lalitha, S.
Tripathi, Shikha
Gupta, Deepa
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (03) : 497 - 510
[5] Combining Speech Features for Aggression Detection Using Deep Neural Networks
Jaafar, Noussaiba
Lachiri, Zied
2020 5TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP'2020), 2020,
[6] Arabic Hate Speech Detection Using Deep Recurrent Neural Networks
Al Anezi, Faisal Yousif
APPLIED SCIENCES-BASEL, 2022, 12 (12):
[7] Study on the Use of Deep Neural Networks for Speech Activity Detection in Broadcast Recordings
Mateju, Lukas
Cerva, Petr
Zdansky, Jindrich
SIGMAP: PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON E-BUSINESS AND TELECOMMUNICATIONS - VOL. 5, 2016, : 45 - 51
[8] Deep Neural Networks for YouTube Recommendations
Covington, Paul
Adams, Jay
Sargin, Emre
PROCEEDINGS OF THE 10TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS'16), 2016, : 191 - 198
[9] Speech watermarking using Deep Neural Networks
Pavlovic, Kosta
Kovacevic, Slavko
Durovic, Igor
2020 28TH TELECOMMUNICATIONS FORUM (TELFOR), 2020, : 292 - 295
[10] Automatic Hate Speech Detection Using Deep Neural Networks and Word Embedding
Ebenezer Ojo, Olumide
Ta, Thang-Hoang
Gelbukh, Alexander
Calvo, Hiram
Sidorov, Grigori
Oluwayemisi Adebanji, Olaronke
COMPUTACION Y SISTEMAS, 2022, 26 (02): : 1007 - 1013

← 1 2 3 4 5 →