Deep Neural Networks with Batch Speaker Normalization for Intoxicated Speech Detection

被引：0

作者：

Wang, Weiqing ^{[1
]}

Wu, Haiwei ^{[1
,2
]}

Li, Ming ^{[1
]}

机构：

[1] Duke Kunshan Univ, Data Sci Res Ctr, Kunshan, Peoples R China

[2] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou, Peoples R China

来源：

2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2019年

关键词：

intoxicated speech detection; Convolutional Neural Network; computational paralinguistics; ALCOHOL-INTOXICATION;

D O I：

暂无

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Alcohol intoxication can affect people both physically and psychologically, and one's speech will also become different. However, detecting the intoxicated state from the speech is a challenging task. In this paper, we first implement the baseline model with ComParE feature and then explore the influence of the speaker information on the intoxication detection task. Besides, we apply a ResNet18 based model to this task. The model contains three parts: a representation learning sub-network with Deep Residual Neural Network(ResNet) of 18-layer, a global average pooling(GAP) layer and a classifier of 2 fully connected layers. Since we cannot perform speaker z-normalization on the variant-length feature input, we employ the batch z-normalization to train the proposed model. It also achieves similar improvement like applying the speaker normalization to the baseline method. Experimental results show that speaker normalization on baseline model and batch z-normalization on ResNet18 based model provides 4.9% and 3.8% improvement respectively. The results show that speaker normalization can improve the performance of both the baseline model and the proposed model.

引用

下载

页码：1323 / 1327

页数：5

共 50 条

[41] A UNIFIED SPEAKER-DEPENDENT SPEECH SEPARATION AND ENHANCEMENT SYSTEM BASED ON DEEP NEURAL NETWORKS
Gao, Tian
Du, Jun
Xu, Li
Liu, Cong
Dai, Li-Rong
Lee, Chin-Hui
2015 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING, 2015, : 687 - 691
[42] Guidelines for the Regularization of Gammas in Batch Normalization for Deep Residual Networks
Kim, Bum Jun
Choi, Hyeyeon
Jang, Hyeonah
Kim, Sang Woo
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2024, 15 (03)
[43] Revisiting Batch Normalization for Training Low-Latency Deep Spiking Neural Networks From Scratch
Kim, Youngeun
Panda, Priyadarshini
FRONTIERS IN NEUROSCIENCE, 2021, 15
[44] Combining Speech Features for Aggression Detection Using Deep Neural Networks
Jaafar, Noussaiba
Lachiri, Zied
2020 5TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP'2020), 2020,
[45] Arabic Hate Speech Detection Using Deep Recurrent Neural Networks
Al Anezi, Faisal Yousif
APPLIED SCIENCES-BASEL, 2022, 12 (12):
[46] Exploiting deep neural networks for detection-based speech recognition
Siniscalchi, Sabato Marco
Yu, Dong
Deng, Li
Lee, Chin-Hui
NEUROCOMPUTING, 2013, 106 : 148 - 157
[47] An Object Detection Algorithm for Deep Learning Based on Batch Normalization
Zhou, Yan
Yuan, Changqing
Zeng, Fanzhi
Qian, Jiechang
Wu, Chen
SMART COMPUTING AND COMMUNICATION, SMARTCOM 2017, 2018, 10699 : 438 - 448
[48] The Representation of Speech in Deep Neural Networks
Scharenborg, Odette
van der Gouw, Nikki
Larson, Martha
Marchiori, Elena
MULTIMEDIA MODELING, MMM 2019, PT II, 2019, 11296 : 194 - 205
[49] Robust speaker detection using Neural Networks
Shell, John R.
PROCEEDINGS OF THE EIGHTH IASTED INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING, 2006, : 414 - 419
[50] A Batch Normalization Free Binarized Convolutional Deep Neural Network on an FPGA
Nakahara, Hiroki
Yonekawa, Haruyoshi
Iwamoto, Hisashi
Motomura, Masato
FPGA'17: PROCEEDINGS OF THE 2017 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, 2017, : 290 - 290

← 1 2 3 4 5 →