Perception and classification of emotions in nonsense speech: Humans versus machines

被引:2
|
作者
Parada-Cabaleiro, Emilia [1 ,2 ,3 ]
Batliner, Anton [3 ]
Schmitt, Maximilian [3 ]
Schedl, Markus [1 ,2 ]
Costantini, Giovanni [4 ]
Schuller, Bjoern [3 ,5 ]
机构
[1] Johannes Kepler Univ Linz, Inst Computat Percept, Linz, Austria
[2] Linz Inst Technol LIT, Human Ctr Grp, Linz, Austria
[3] Univ Augsburg, Chair Embedded Intelligence Hlth Care & Wellbeing, Augsburg, Germany
[4] Univ Roma Tor Vergata, Dept Elect Engn, Rome, Italy
[5] Imperial Coll London, GLAM Grp Language,Audio & Mus, London, England
来源
PLOS ONE | 2023年 / 18卷 / 01期
基金
欧盟地平线“2020”; 奥地利科学基金会;
关键词
RECOGNITION; PROSODY; PITCH;
D O I
10.1371/journal.pone.0281079
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
This article contributes to a more adequate modelling of emotions encoded in speech, by addressing four fallacies prevalent in traditional affective computing: First, studies concentrate on few emotions and disregard all other ones ('closed world'). Second, studies use clean (lab) data or real-life ones but do not compare clean and noisy data in a comparable setting ('clean world'). Third, machine learning approaches need large amounts of data; however, their performance has not yet been assessed by systematically comparing different approaches and different sizes of databases ('small world'). Fourth, although human annotations of emotion constitute the basis for automatic classification, human perception and machine classification have not yet been compared on a strict basis ('one world'). Finally, we deal with the intrinsic ambiguities of emotions by interpreting the confusions between categories ('fuzzy world'). We use acted nonsense speech from the GEMEP corpus, emotional 'distractors' as categories not entailed in the test set, real-life noises that mask the clear recordings, and different sizes of the training set for machine learning. We show that machine learning based on state-of-the-art feature representations (wav2vec2) is able to mirror the main emotional categories ('pillars') present in perceptual emotional constellations even in degradated acoustic conditions.
引用
收藏
页数:26
相关论文
共 50 条
  • [41] Assessing costa rican children speech recognition by humans and machines
    Morales-Rodriguez, Maribel
    Coto-Jimenez, Marvin
    [J]. TECNOLOGIA EN MARCHA, 2022, 35
  • [42] SYNTHESIS AND RECOGNITION OF SPEECH - VOICE COMMUNICATION BETWEEN HUMANS AND MACHINES
    FLANAGAN, JL
    [J]. IEEE TRANSACTIONS ON SONICS AND ULTRASONICS, 1982, 29 (03): : 158 - 158
  • [43] Emotions versus laws as the keys to the ethical design of intelligent machines
    Hibbard, WL
    [J]. 6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XIII, PROCEEDINGS: CONCEPTS AND APPLICATIONS OF SYSTEMICS, CYBERNETICS AND INFORMATICS III, 2002, : 469 - 472
  • [44] PERCEPTION OF NONSENSE PASSAGES IN RELATION TO AMOUNT OF INFORMATION AND SPEECH-TO-NOISE RATIO
    MILLER, I
    [J]. JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 1957, 53 (06): : 388 - 393
  • [45] Exploring the Effect of Motion Type and Emotions on the Perception of Gender in Virtual Humans
    Zibrek, Katja
    Hoyet, Ludovic
    Ruhland, Kerstin
    McDonnell, Rachel
    [J]. ACM TRANSACTIONS ON APPLIED PERCEPTION, 2015, 12 (03)
  • [46] Evaluating Speech Perception of the MAXUM Middle Ear Implant Versus Speech Perception Under Inserts
    Dyer, R. Kent
    Spearman, Michael
    Spearman, Brian
    McCraney, Anna
    [J]. LARYNGOSCOPE, 2018, 128 (02): : 456 - 460
  • [47] Online Ternary Classification of Covert Speech by Leveraging the Passive Perception of Speech
    Moon, Jae
    Chau, Tom
    [J]. INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2023,
  • [48] ASYMMETRIES BETWEEN THE PERCEPTION VERSUS PRODUCTION OF SPEECH
    MACKAY, DG
    [J]. BULLETIN OF THE PSYCHONOMIC SOCIETY, 1986, 24 (05) : 344 - 344
  • [49] The impact of allophony versus contrast on speech perception
    Boomershine, Amanda
    Hall, Kathleen Currie
    Hume, Elizabeth
    Johnson, Keith
    [J]. CONTRAST IN PHONOLOGY: THEORY, PERCEPTION, ACQUISITION, 2008, 13 : 145 - 171
  • [50] Prosody and Semantics Are Separate but Not Separable Channels in the Perception of Emotional Speech: Test for Rating of Emotions in Speech
    Ben-David, Boaz M.
    Multani, Namita
    Shakuf, Vered
    Rudzicz, Frank
    van Lieshout, Pascal H. H. M.
    [J]. JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2016, 59 (01): : 72 - 89