Deep Neural Networks with Batch Speaker Normalization for Intoxicated Speech Detection

被引:0
|
作者
Wang, Weiqing [1 ]
Wu, Haiwei [1 ,2 ]
Li, Ming [1 ]
机构
[1] Duke Kunshan Univ, Data Sci Res Ctr, Kunshan, Peoples R China
[2] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou, Peoples R China
关键词
intoxicated speech detection; Convolutional Neural Network; computational paralinguistics; ALCOHOL-INTOXICATION;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Alcohol intoxication can affect people both physically and psychologically, and one's speech will also become different. However, detecting the intoxicated state from the speech is a challenging task. In this paper, we first implement the baseline model with ComParE feature and then explore the influence of the speaker information on the intoxication detection task. Besides, we apply a ResNet18 based model to this task. The model contains three parts: a representation learning sub-network with Deep Residual Neural Network(ResNet) of 18-layer, a global average pooling(GAP) layer and a classifier of 2 fully connected layers. Since we cannot perform speaker z-normalization on the variant-length feature input, we employ the batch z-normalization to train the proposed model. It also achieves similar improvement like applying the speaker normalization to the baseline method. Experimental results show that speaker normalization on baseline model and batch z-normalization on ResNet18 based model provides 4.9% and 3.8% improvement respectively. The results show that speaker normalization can improve the performance of both the baseline model and the proposed model.
引用
下载
收藏
页码:1323 / 1327
页数:5
相关论文
共 50 条
  • [41] A UNIFIED SPEAKER-DEPENDENT SPEECH SEPARATION AND ENHANCEMENT SYSTEM BASED ON DEEP NEURAL NETWORKS
    Gao, Tian
    Du, Jun
    Xu, Li
    Liu, Cong
    Dai, Li-Rong
    Lee, Chin-Hui
    2015 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING, 2015, : 687 - 691
  • [42] Guidelines for the Regularization of Gammas in Batch Normalization for Deep Residual Networks
    Kim, Bum Jun
    Choi, Hyeyeon
    Jang, Hyeonah
    Kim, Sang Woo
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2024, 15 (03)
  • [43] Revisiting Batch Normalization for Training Low-Latency Deep Spiking Neural Networks From Scratch
    Kim, Youngeun
    Panda, Priyadarshini
    FRONTIERS IN NEUROSCIENCE, 2021, 15
  • [44] Combining Speech Features for Aggression Detection Using Deep Neural Networks
    Jaafar, Noussaiba
    Lachiri, Zied
    2020 5TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP'2020), 2020,
  • [45] Arabic Hate Speech Detection Using Deep Recurrent Neural Networks
    Al Anezi, Faisal Yousif
    APPLIED SCIENCES-BASEL, 2022, 12 (12):
  • [46] Exploiting deep neural networks for detection-based speech recognition
    Siniscalchi, Sabato Marco
    Yu, Dong
    Deng, Li
    Lee, Chin-Hui
    NEUROCOMPUTING, 2013, 106 : 148 - 157
  • [47] An Object Detection Algorithm for Deep Learning Based on Batch Normalization
    Zhou, Yan
    Yuan, Changqing
    Zeng, Fanzhi
    Qian, Jiechang
    Wu, Chen
    SMART COMPUTING AND COMMUNICATION, SMARTCOM 2017, 2018, 10699 : 438 - 448
  • [48] The Representation of Speech in Deep Neural Networks
    Scharenborg, Odette
    van der Gouw, Nikki
    Larson, Martha
    Marchiori, Elena
    MULTIMEDIA MODELING, MMM 2019, PT II, 2019, 11296 : 194 - 205
  • [49] Robust speaker detection using Neural Networks
    Shell, John R.
    PROCEEDINGS OF THE EIGHTH IASTED INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING, 2006, : 414 - 419
  • [50] A Batch Normalization Free Binarized Convolutional Deep Neural Network on an FPGA
    Nakahara, Hiroki
    Yonekawa, Haruyoshi
    Iwamoto, Hisashi
    Motomura, Masato
    FPGA'17: PROCEEDINGS OF THE 2017 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, 2017, : 290 - 290