Deep Neural Networks with Batch Speaker Normalization for Intoxicated Speech Detection

被引:0
|
作者
Wang, Weiqing [1 ]
Wu, Haiwei [1 ,2 ]
Li, Ming [1 ]
机构
[1] Duke Kunshan Univ, Data Sci Res Ctr, Kunshan, Peoples R China
[2] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou, Peoples R China
关键词
intoxicated speech detection; Convolutional Neural Network; computational paralinguistics; ALCOHOL-INTOXICATION;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Alcohol intoxication can affect people both physically and psychologically, and one's speech will also become different. However, detecting the intoxicated state from the speech is a challenging task. In this paper, we first implement the baseline model with ComParE feature and then explore the influence of the speaker information on the intoxication detection task. Besides, we apply a ResNet18 based model to this task. The model contains three parts: a representation learning sub-network with Deep Residual Neural Network(ResNet) of 18-layer, a global average pooling(GAP) layer and a classifier of 2 fully connected layers. Since we cannot perform speaker z-normalization on the variant-length feature input, we employ the batch z-normalization to train the proposed model. It also achieves similar improvement like applying the speaker normalization to the baseline method. Experimental results show that speaker normalization on baseline model and batch z-normalization on ResNet18 based model provides 4.9% and 3.8% improvement respectively. The results show that speaker normalization can improve the performance of both the baseline model and the proposed model.
引用
下载
收藏
页码:1323 / 1327
页数:5
相关论文
共 50 条
  • [31] Enhanced speech emotion detection using deep neural networks
    S. Lalitha
    Shikha Tripathi
    Deepa Gupta
    International Journal of Speech Technology, 2019, 22 : 497 - 510
  • [32] Enhanced speech emotion detection using deep neural networks
    Lalitha, S.
    Tripathi, Shikha
    Gupta, Deepa
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (03) : 497 - 510
  • [33] Deep Normalization for Speaker Vectors
    Cai, Yunqi
    Li, Lantian
    Abel, Andrew
    Zhu, Xiaoyan
    Wang, Dong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 733 - 744
  • [34] Speech Activity Detection Under Adverse Conditions Using Neural Networks and Speaker Diarization
    Ulgen, Ismail Rasim
    Saraclar, Murat
    2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [35] Speaker normalization on conversational telephone speech
    Wegmann, S
    McAllaster, D
    Orloff, J
    Peskin, B
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 339 - 341
  • [36] Insights into Deep Neural Networks for Speaker Recognition
    Garcia-Romero, Daniel
    McCree, Alan
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1141 - 1145
  • [37] DEEP NEURAL NETWORKS FOR COCHANNEL SPEAKER IDENTIFICATION
    Zhao, Xiaojia
    Wang, Yuxuan
    Wang, DeLiang
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4824 - 4828
  • [38] Is normalization indispensable for training deep neural networks?
    Shao, Jie
    Hu, Kai
    Wang, Changhu
    Xue, Xiangyang
    Raj, Bhiksha
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [39] ROBUST PITCH TRACKING IN NOISY SPEECH USING SPEAKER-DEPENDENT DEEP NEURAL NETWORKS
    Liu, Yuzhou
    Wane, DeLiang
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5255 - 5259
  • [40] Semi-Supervised Speaker Adaptation for In-Vehicle Speech Recognition with Deep Neural Networks
    Lee, Wonkyum
    Hang, Kyu J.
    Lane, Ian
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3843 - 3847