Perception and classification of emotions in nonsense speech: Humans versus machines

被引:2
|
作者
Parada-Cabaleiro, Emilia [1 ,2 ,3 ]
Batliner, Anton [3 ]
Schmitt, Maximilian [3 ]
Schedl, Markus [1 ,2 ]
Costantini, Giovanni [4 ]
Schuller, Bjoern [3 ,5 ]
机构
[1] Johannes Kepler Univ Linz, Inst Computat Percept, Linz, Austria
[2] Linz Inst Technol LIT, Human Ctr Grp, Linz, Austria
[3] Univ Augsburg, Chair Embedded Intelligence Hlth Care & Wellbeing, Augsburg, Germany
[4] Univ Roma Tor Vergata, Dept Elect Engn, Rome, Italy
[5] Imperial Coll London, GLAM Grp Language,Audio & Mus, London, England
来源
PLOS ONE | 2023年 / 18卷 / 01期
基金
奥地利科学基金会; 欧盟地平线“2020”;
关键词
RECOGNITION; PROSODY; PITCH;
D O I
10.1371/journal.pone.0281079
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
This article contributes to a more adequate modelling of emotions encoded in speech, by addressing four fallacies prevalent in traditional affective computing: First, studies concentrate on few emotions and disregard all other ones ('closed world'). Second, studies use clean (lab) data or real-life ones but do not compare clean and noisy data in a comparable setting ('clean world'). Third, machine learning approaches need large amounts of data; however, their performance has not yet been assessed by systematically comparing different approaches and different sizes of databases ('small world'). Fourth, although human annotations of emotion constitute the basis for automatic classification, human perception and machine classification have not yet been compared on a strict basis ('one world'). Finally, we deal with the intrinsic ambiguities of emotions by interpreting the confusions between categories ('fuzzy world'). We use acted nonsense speech from the GEMEP corpus, emotional 'distractors' as categories not entailed in the test set, real-life noises that mask the clear recordings, and different sizes of the training set for machine learning. We show that machine learning based on state-of-the-art feature representations (wav2vec2) is able to mirror the main emotional categories ('pillars') present in perceptual emotional constellations even in degradated acoustic conditions.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] The Perception of Emotions in Noisified Nonsense Speech
    Parada-Cabaleiro, Emilia
    Baird, Alice
    Batliner, Anton
    Cummins, Nicholas
    Hantke, Simone
    Schuller, Bjoern W.
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3246 - 3250
  • [2] PERCEPTION OF NONSENSE SHAPES IN HUMANS
    BOHDANECKY, Z
    BOZKOV, V
    RADILWEISS, T
    [J]. ACTIVITAS NERVOSA SUPERIOR, 1974, 16 (04): : 296 - 297
  • [3] For speech perception by humans or machines, three senses are better than one
    Bernstein, LE
    Benoit, C
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1477 - 1480
  • [4] Classifying coherent versus nonsense speech perception from EEG using linguistic speech features
    Puffay, Corentin
    Vanthornhout, Jonas
    Gillis, Marlies
    De Clercq, Pieter
    Accou, Bernd
    Van Hamme, Hugo
    Francart, Tom
    [J]. SCIENTIFIC REPORTS, 2024, 14 (01):
  • [5] Emotion, age, and gender classification in children's speech by humans and machines
    Kaya, Heysem
    Salah, Albert Ali
    Karpovc, Alexey
    Frolova, Olga
    Grigorev, Aleksey
    Lyakso, Elena
    [J]. COMPUTER SPEECH AND LANGUAGE, 2017, 46 : 268 - 283
  • [6] Speech separation in humans and machines
    Ellis, D
    [J]. 2005 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2005, : 1 - 1
  • [7] Speech recognition by machines and humans
    Lippmann, RP
    [J]. SPEECH COMMUNICATION, 1997, 22 (01) : 1 - 15
  • [8] Perception of Agent Properties in Humans and Machines
    Lagerstedt, Erik
    Thill, Serge
    [J]. PERCEPTION, 2019, 48 : 124 - 124
  • [9] Perception of place-of-articulation information in natural speech by monkeys versus humans
    Joan M. Sinnott
    Casey S. Gilmore
    [J]. Perception & Psychophysics, 2004, 66 : 1341 - 1350
  • [10] Perception of place-of-articulation information in natural speech by monkeys versus humans
    Sinnott, JM
    Gilmore, CS
    [J]. PERCEPTION & PSYCHOPHYSICS, 2004, 66 (08): : 1341 - 1350