Deep Neural Networks with Batch Speaker Normalization for Intoxicated Speech Detection

被引:0
|
作者
Wang, Weiqing [1 ]
Wu, Haiwei [1 ,2 ]
Li, Ming [1 ]
机构
[1] Duke Kunshan Univ, Data Sci Res Ctr, Kunshan, Peoples R China
[2] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou, Peoples R China
关键词
intoxicated speech detection; Convolutional Neural Network; computational paralinguistics; ALCOHOL-INTOXICATION;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Alcohol intoxication can affect people both physically and psychologically, and one's speech will also become different. However, detecting the intoxicated state from the speech is a challenging task. In this paper, we first implement the baseline model with ComParE feature and then explore the influence of the speaker information on the intoxication detection task. Besides, we apply a ResNet18 based model to this task. The model contains three parts: a representation learning sub-network with Deep Residual Neural Network(ResNet) of 18-layer, a global average pooling(GAP) layer and a classifier of 2 fully connected layers. Since we cannot perform speaker z-normalization on the variant-length feature input, we employ the batch z-normalization to train the proposed model. It also achieves similar improvement like applying the speaker normalization to the baseline method. Experimental results show that speaker normalization on baseline model and batch z-normalization on ResNet18 based model provides 4.9% and 3.8% improvement respectively. The results show that speaker normalization can improve the performance of both the baseline model and the proposed model.
引用
下载
收藏
页码:1323 / 1327
页数:5
相关论文
共 50 条
  • [1] Generalized Batch Normalization: Towards Accelerating Deep Neural Networks
    Yuan, Xiaoyong
    Feng, Zheng
    Norton, Matthew
    Li, Xiaolin
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 1682 - 1689
  • [2] Improving Batch Normalization with Skewness Reduction for Deep Neural Networks
    Ding, Pak Lun Kevin
    Martin, Sarah
    Li, Baoxin
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7165 - 7172
  • [3] On Centralization and Unitization of Batch Normalization for Deep ReLU Neural Networks
    Fei, Wen
    Dai, Wenrui
    Li, Chenglin
    Zou, Junni
    Xiong, Hongkai
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2024, 72 : 2827 - 2841
  • [4] Nonlinear normalization of input patterns to speaker variability in speech recognition neural networks
    Nejadgholi, Isar
    Seyyedsalehi, Seyyed Ali
    NEURAL COMPUTING & APPLICATIONS, 2009, 18 (01): : 45 - 55
  • [5] Nonlinear normalization of input patterns to speaker variability in speech recognition neural networks
    Isar Nejadgholi
    Seyyed Ali Seyyedsalehi
    Neural Computing and Applications, 2009, 18 : 45 - 55
  • [6] Deep neural networks for speaker verification with short speech utterances
    Yang, Il-Ho
    Heo, Hee-Soo
    Yoon, Sung-Hyun
    Yu, Ha-Jin
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2016, 35 (06): : 501 - 509
  • [7] Speech Separation of A Target Speaker Based on Deep Neural Networks
    Du Jun
    Tu Yanhui
    Xu Yong
    Dai Lirong
    Chin-Hui, Lee
    2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 473 - 477
  • [8] DEEP NEURAL NETWORK TRAINED WITH SPEAKER REPRESENTATION FOR SPEAKER NORMALIZATION
    Tang, Yun
    Mohan, Aanchan
    Rose, Richard C.
    Ma, Chengyuan
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [9] Deep Neural Networks for Multiple Speaker Detection and Localization
    He, Weipeng
    Motlicek, Petr
    Odobez, Jean-Marc
    2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 74 - 79
  • [10] Correlation Networks for Speaker Normalization in Automatic Speech Recognition
    Sharon, Rini A.
    Kothinti, Sandeep Reddy
    Umesh, Srinivasan
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 882 - 886