Perception and classification of emotions in nonsense speech: Humans versus machines

被引:4
|
作者
Parada-Cabaleiro, Emilia [1 ,2 ,3 ]
Batliner, Anton [3 ]
Schmitt, Maximilian [3 ]
Schedl, Markus [1 ,2 ]
Costantini, Giovanni [4 ]
Schuller, Bjoern [3 ,5 ]
机构
[1] Johannes Kepler Univ Linz, Inst Computat Percept, Linz, Austria
[2] Linz Inst Technol LIT, Human Ctr Grp, Linz, Austria
[3] Univ Augsburg, Chair Embedded Intelligence Hlth Care & Wellbeing, Augsburg, Germany
[4] Univ Roma Tor Vergata, Dept Elect Engn, Rome, Italy
[5] Imperial Coll London, GLAM Grp Language,Audio & Mus, London, England
来源
PLOS ONE | 2023年 / 18卷 / 01期
基金
欧盟地平线“2020”; 奥地利科学基金会;
关键词
RECOGNITION; PROSODY; PITCH;
D O I
10.1371/journal.pone.0281079
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
This article contributes to a more adequate modelling of emotions encoded in speech, by addressing four fallacies prevalent in traditional affective computing: First, studies concentrate on few emotions and disregard all other ones ('closed world'). Second, studies use clean (lab) data or real-life ones but do not compare clean and noisy data in a comparable setting ('clean world'). Third, machine learning approaches need large amounts of data; however, their performance has not yet been assessed by systematically comparing different approaches and different sizes of databases ('small world'). Fourth, although human annotations of emotion constitute the basis for automatic classification, human perception and machine classification have not yet been compared on a strict basis ('one world'). Finally, we deal with the intrinsic ambiguities of emotions by interpreting the confusions between categories ('fuzzy world'). We use acted nonsense speech from the GEMEP corpus, emotional 'distractors' as categories not entailed in the test set, real-life noises that mask the clear recordings, and different sizes of the training set for machine learning. We show that machine learning based on state-of-the-art feature representations (wav2vec2) is able to mirror the main emotional categories ('pillars') present in perceptual emotional constellations even in degradated acoustic conditions.
引用
收藏
页数:26
相关论文
共 50 条
  • [21] On seeing stuff: The perception of materials by humans and machines
    Adelson, EH
    HUMAN VISION AND ELECTRONIC IMAGING VI, 2001, 4299 : 1 - 12
  • [22] Detecting changing emotions in human speech by machine and humans
    van der Wal, C. Natalie
    Kowalczyk, Wojtek
    APPLIED INTELLIGENCE, 2013, 39 (04) : 675 - 691
  • [23] Automation: humans versus machines? Editorial
    McDonald, JC
    RADIATION PROTECTION DOSIMETRY, 2005, 114 (04) : 467 - 468
  • [24] Detecting changing emotions in human speech by machine and humans
    C. Natalie van der Wal
    Wojtek Kowalczyk
    Applied Intelligence, 2013, 39 : 675 - 691
  • [25] MODELING SPEECH PERCEPTION WITH RESTRICTED BOLTZMANN MACHINES
    Klein, Michael
    ten Bosch, Louis
    Boves, Lou
    CONNECTIONIST MODELS OD NEUROCOGNITION AND EMERGENT BEHAVIOR: FROM THEORY TO APPLICATIONS, 2012, 20 : 93 - 109
  • [26] Emotions Classification from Speech with Deep Learning
    Chowanda, Andry
    Muliono, Yohan
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (04) : 777 - 781
  • [27] A Subjective Evaluation of the Effects of Speech Coding on the Perception of Emotions
    Labelle, Felix
    Lefebvre, Roch
    Gournay, Philippe
    2016 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ISPACS), 2016, : 226 - 231
  • [28] A biocybernetic simulation of speech perception by humans and animals
    Damper, RI
    SMC '97 CONFERENCE PROCEEDINGS - 1997 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: CONFERENCE THEME: COMPUTATIONAL CYBERNETICS AND SIMULATION, 1997, : 1638 - 1643
  • [29] PATTERN-RECOGNITION BY HUMANS AND MACHINES, VOL 1, SPEECH-PERCEPTION - SCHWAB,EC, NUSBAUM,HC
    SEGUI, J
    ANNEE PSYCHOLOGIQUE, 1988, 88 (02): : 294 - 295
  • [30] PATTERN-RECOGNITION BY HUMANS AND MACHINES, VOL 1, SPEECH-PERCEPTION - SCHWAB,EC, NUSBAUM,HC
    WATERWORTH, JA
    CURRENT PSYCHOLOGY-RESEARCH & REVIEWS, 1988, 7 (03): : 272 - 273