Deep Neural Networks with Batch Speaker Normalization for Intoxicated Speech Detection

被引：0

作者：

Wang, Weiqing ^{[1
]}

Wu, Haiwei ^{[1
,2
]}

Li, Ming ^{[1
]}

机构：

[1] Duke Kunshan Univ, Data Sci Res Ctr, Kunshan, Peoples R China

[2] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou, Peoples R China

来源：

2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2019年

关键词：

intoxicated speech detection; Convolutional Neural Network; computational paralinguistics; ALCOHOL-INTOXICATION;

D O I：

暂无

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Alcohol intoxication can affect people both physically and psychologically, and one's speech will also become different. However, detecting the intoxicated state from the speech is a challenging task. In this paper, we first implement the baseline model with ComParE feature and then explore the influence of the speaker information on the intoxication detection task. Besides, we apply a ResNet18 based model to this task. The model contains three parts: a representation learning sub-network with Deep Residual Neural Network(ResNet) of 18-layer, a global average pooling(GAP) layer and a classifier of 2 fully connected layers. Since we cannot perform speaker z-normalization on the variant-length feature input, we employ the batch z-normalization to train the proposed model. It also achieves similar improvement like applying the speaker normalization to the baseline method. Experimental results show that speaker normalization on baseline model and batch z-normalization on ResNet18 based model provides 4.9% and 3.8% improvement respectively. The results show that speaker normalization can improve the performance of both the baseline model and the proposed model.

引用

下载

页码：1323 / 1327

页数：5

共 50 条

[1] Generalized Batch Normalization: Towards Accelerating Deep Neural Networks
Yuan, Xiaoyong
Feng, Zheng
Norton, Matthew
Li, Xiaolin
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 1682 - 1689
[2] Improving Batch Normalization with Skewness Reduction for Deep Neural Networks
Ding, Pak Lun Kevin
Martin, Sarah
Li, Baoxin
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7165 - 7172
[3] On Centralization and Unitization of Batch Normalization for Deep ReLU Neural Networks
Fei, Wen
Dai, Wenrui
Li, Chenglin
Zou, Junni
Xiong, Hongkai
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2024, 72 : 2827 - 2841
[4] Nonlinear normalization of input patterns to speaker variability in speech recognition neural networks
Nejadgholi, Isar
Seyyedsalehi, Seyyed Ali
NEURAL COMPUTING & APPLICATIONS, 2009, 18 (01): : 45 - 55
[5] Nonlinear normalization of input patterns to speaker variability in speech recognition neural networks
Isar Nejadgholi
Seyyed Ali Seyyedsalehi
Neural Computing and Applications, 2009, 18 : 45 - 55
[6] Deep neural networks for speaker verification with short speech utterances
Yang, Il-Ho
Heo, Hee-Soo
Yoon, Sung-Hyun
Yu, Ha-Jin
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2016, 35 (06): : 501 - 509
[7] Speech Separation of A Target Speaker Based on Deep Neural Networks
Du Jun
Tu Yanhui
Xu Yong
Dai Lirong
Chin-Hui, Lee
2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 473 - 477
[8] DEEP NEURAL NETWORK TRAINED WITH SPEAKER REPRESENTATION FOR SPEAKER NORMALIZATION
Tang, Yun
Mohan, Aanchan
Rose, Richard C.
Ma, Chengyuan
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[9] Deep Neural Networks for Multiple Speaker Detection and Localization
He, Weipeng
Motlicek, Petr
Odobez, Jean-Marc
2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 74 - 79
[10] Correlation Networks for Speaker Normalization in Automatic Speech Recognition
Sharon, Rini A.
Kothinti, Sandeep Reddy
Umesh, Srinivasan
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 882 - 886

← 1 2 3 4 5 →