A Cross-Corpus Speech-Based Analysis of Escalating Negative Interactions

被引:3
|
作者
Lefter, Iulia [1 ]
Baird, Alice [2 ]
Stappen, Lukas [2 ]
Schuller, Bjorn W. [2 ,3 ]
机构
[1] Delft Univ Technol, Dept Multiactor Syst, Delft, Netherlands
[2] Univ Augsburg, Chair Embedded Intelligence Hlth Care & Wellbeing, Augsburg, Germany
[3] Imperial Coll London, Grp Language Audio & Mus, London, England
来源
关键词
affective computing; negative interactions; cross-corpora analysis; conflict escalation; speech paralinguistics; emotion recognition; ACOUSTIC EMOTION RECOGNITION; CONFLICT;
D O I
10.3389/fcomp.2022.749804
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The monitoring of an escalating negative interaction has several benefits, particularly in security, (mental) health, and group management. The speech signal is particularly suited to this, as aspects of escalation, including emotional arousal, are proven to easily be captured by the audio signal. A challenge of applying trained systems in real-life applications is their strong dependence on the training material and limited generalization abilities. For this reason, in this contribution, we perform an extensive analysis of three corpora in the Dutch language. All three corpora are high in escalation behavior content and are annotated on alternative dimensions related to escalation. A process of label mapping resulted in two possible ground truth estimations for the three datasets as low, medium, and high escalation levels. To observe class behavior and inter-corpus differences more closely, we perform acoustic analysis of the audio samples, finding that derived labels perform similarly across each corpus, with escalation interaction increasing in pitch (F0) and intensity (dB). We explore the suitability of different speech features, data augmentation, merging corpora for training, and testing on actor and non-actor speech through our experiments. We find that the extent to which merging corpora is successful depends greatly on the similarities between label definitions before label mapping. Finally, we see that the escalation recognition task can be performed in a cross-corpus setup with hand-crafted speech features, obtaining up to 63.8% unweighted average recall (UAR) at best for a cross-corpus analysis, an increase from the inter-corpus results of 59.4% UAR.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Cross-corpus speech emotion recognition using subspace learning and domain adaption
    Cao, Xuan
    Jia, Maoshen
    Ru, Jiawei
    Pai, Tun-wen
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2022, 2022 (01)
  • [42] Progressive distribution adapted neural networks for cross-corpus speech emotion recognition
    Zong, Yuan
    Lian, Hailun
    Zhang, Jiacheng
    Feng, Ercui
    Lu, Cheng
    Chang, Hongli
    Tang, Chuangao
    FRONTIERS IN NEUROROBOTICS, 2022, 16
  • [43] Cross-Corpus Speech Emotion Recognition Based on Domain-Adaptive Least-Squares Regression
    Zong, Yuan
    Zheng, Wenming
    Zhang, Tong
    Huang, Xiaohua
    IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (05) : 585 - 589
  • [44] Cross-corpus speech emotion recognition using subspace learning and domain adaption
    Xuan Cao
    Maoshen Jia
    Jiawei Ru
    Tun-wen Pai
    EURASIP Journal on Audio, Speech, and Music Processing, 2022
  • [45] Filter-based multi-task cross-corpus feature learning for speech emotion recognition
    Bakhtiari, Behzad
    Kalhor, Elham
    Ghafarian, Seyed Hossein
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (04) : 3145 - 3153
  • [46] Cross-Corpus Speech Emotion Recognition Based on Deep Domain-Adaptive Convolutional Neural Network
    Liu, Jiateng
    Zheng, Wenming
    Zong, Yuan
    Lu, Cheng
    Tang, Chuangao
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (02) : 459 - 463
  • [47] Within and cross-corpus speech emotion recognition using latent topic model-based features
    Shah, Mohit
    Chakrabarti, Chaitali
    Spanias, Andreas
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015,
  • [48] Verifying Human Users in Speech-Based Interactions
    Shirali-Shahreza, Sajad
    Ganjali, Yashar
    Balakrishnan, Ravin
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1596 - 1599
  • [49] Transfer Sparse Discriminant Subspace Learning for Cross-Corpus Speech Emotion Recognition
    Zhang, Weijian
    Song, Peng
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (28) : 307 - 318
  • [50] CROSS-CORPUS SPEECH EMOTION RECOGNITION USING JOINT DISTRIBUTION ADAPTIVE REGRESSION
    Zhang, Jiacheng
    Jiang, Lin
    Zong, Yuan
    Zheng, Wenming
    Zhao, Li
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3790 - 3794