Deep Neural Networks with Batch Speaker Normalization for Intoxicated Speech Detection

被引:0
|
作者
Wang, Weiqing [1 ]
Wu, Haiwei [1 ,2 ]
Li, Ming [1 ]
机构
[1] Duke Kunshan Univ, Data Sci Res Ctr, Kunshan, Peoples R China
[2] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou, Peoples R China
关键词
intoxicated speech detection; Convolutional Neural Network; computational paralinguistics; ALCOHOL-INTOXICATION;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Alcohol intoxication can affect people both physically and psychologically, and one's speech will also become different. However, detecting the intoxicated state from the speech is a challenging task. In this paper, we first implement the baseline model with ComParE feature and then explore the influence of the speaker information on the intoxication detection task. Besides, we apply a ResNet18 based model to this task. The model contains three parts: a representation learning sub-network with Deep Residual Neural Network(ResNet) of 18-layer, a global average pooling(GAP) layer and a classifier of 2 fully connected layers. Since we cannot perform speaker z-normalization on the variant-length feature input, we employ the batch z-normalization to train the proposed model. It also achieves similar improvement like applying the speaker normalization to the baseline method. Experimental results show that speaker normalization on baseline model and batch z-normalization on ResNet18 based model provides 4.9% and 3.8% improvement respectively. The results show that speaker normalization can improve the performance of both the baseline model and the proposed model.
引用
下载
收藏
页码:1323 / 1327
页数:5
相关论文
共 50 条
  • [21] Deep Neural Networks for joint Voice Activity Detection and Speaker Localization
    Vecchiotti, Paolo
    Principi, Emanuele
    Squartini, Stefano
    Piazza, Francesco
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 1567 - 1571
  • [22] NORMALIZATION EFFECTS ON DEEP NEURAL NETWORKS
    Yu, Jiahui
    Spiliopoulos, Konstantinos
    FOUNDATIONS OF DATA SCIENCE, 2023, 5 (03): : 389 - 465
  • [23] Batch Normalization Orthogonalizes Representations in Deep Random Networks
    Daneshmand, Hadi
    Joudaki, Amir
    Bach, Francis
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [24] Membrane Potential Batch Normalization for Spiking Neural Networks
    Guo, Yufei
    Zhang, Yuhan
    Chen, Yuanpei
    Peng, Weihang
    Liu, Xiaode
    Zhang, Liwen
    Huang, Xuhui
    Ma, Zhe
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 19363 - 19373
  • [25] Interpolating Convolutional Neural Networks Using Batch Normalization
    Data, Gratianus Wesley Putra
    Ngu, Kirjon
    Murray, David William
    Prisacariu, Victor Adrian
    COMPUTER VISION - ECCV 2018, PT XIII, 2018, 11217 : 591 - 606
  • [26] Temporal Effective Batch Normalization in Spiking Neural Networks
    Duan, Chaoteng
    Ding, Jianhao
    Chen, Shiyan
    Yu, Zhaofei
    Huang, Tiejun
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [27] ITERATIVE DEEP NEURAL NETWORKS FOR SPEAKER-INDEPENDENT BINAURAL BLIND SPEECH SEPARATION
    Liu, Qingju
    Xu, Yong
    Jackson, Philip J. B.
    Wang, Wenwu
    Coleman, Philip
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 541 - 545
  • [28] Intoxicated Speech Detection by Fusion of Speaker Normalized Hierarchical Features and GMM Supervectors
    Bone, Daniel
    Black, Matthew P.
    Li, Ming
    Metallinou, Angeliki
    Lee, Sungbok
    Narayanan, Shrikanth S.
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 3224 - 3227
  • [29] Regularizing deep neural networks for medical image analysis with augmented batch normalization[Formula presented]
    Zhu, Shengqian
    Yu, Chengrong
    Hu, Junjie
    Applied Soft Computing, 2024, 154
  • [30] Speech Activity Detection on YouTube Using Deep Neural Networks
    Ryant, Neville
    Liberman, Mark
    Yuan, Jiahong
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 728 - 731