Perception and classification of emotions in nonsense speech: Humans versus machines

被引：2

作者：

Parada-Cabaleiro, Emilia ^{[1
,2
,3
]}

Batliner, Anton ^{[3
]}

Schmitt, Maximilian ^{[3
]}

Schedl, Markus ^{[1
,2
]}

Costantini, Giovanni ^{[4
]}

Schuller, Bjoern ^{[3
,5
]}

机构：

[1] Johannes Kepler Univ Linz, Inst Computat Percept, Linz, Austria

[2] Linz Inst Technol LIT, Human Ctr Grp, Linz, Austria

[3] Univ Augsburg, Chair Embedded Intelligence Hlth Care & Wellbeing, Augsburg, Germany

[4] Univ Roma Tor Vergata, Dept Elect Engn, Rome, Italy

[5] Imperial Coll London, GLAM Grp Language,Audio & Mus, London, England

来源：

PLOS ONE | 2023年 / 18卷 / 01期

基金：

欧盟地平线“2020”; 奥地利科学基金会;

关键词：

RECOGNITION; PROSODY; PITCH;

D O I：

10.1371/journal.pone.0281079

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

This article contributes to a more adequate modelling of emotions encoded in speech, by addressing four fallacies prevalent in traditional affective computing: First, studies concentrate on few emotions and disregard all other ones ('closed world'). Second, studies use clean (lab) data or real-life ones but do not compare clean and noisy data in a comparable setting ('clean world'). Third, machine learning approaches need large amounts of data; however, their performance has not yet been assessed by systematically comparing different approaches and different sizes of databases ('small world'). Fourth, although human annotations of emotion constitute the basis for automatic classification, human perception and machine classification have not yet been compared on a strict basis ('one world'). Finally, we deal with the intrinsic ambiguities of emotions by interpreting the confusions between categories ('fuzzy world'). We use acted nonsense speech from the GEMEP corpus, emotional 'distractors' as categories not entailed in the test set, real-life noises that mask the clear recordings, and different sizes of the training set for machine learning. We show that machine learning based on state-of-the-art feature representations (wav2vec2) is able to mirror the main emotional categories ('pillars') present in perceptual emotional constellations even in degradated acoustic conditions.

引用

页数：26

共 50 条

[41] Assessing costa rican children speech recognition by humans and machines
Morales-Rodriguez, Maribel
Coto-Jimenez, Marvin
[J]. TECNOLOGIA EN MARCHA, 2022, 35
[42] SYNTHESIS AND RECOGNITION OF SPEECH - VOICE COMMUNICATION BETWEEN HUMANS AND MACHINES
FLANAGAN, JL
[J]. IEEE TRANSACTIONS ON SONICS AND ULTRASONICS, 1982, 29 (03): : 158 - 158
[43] Emotions versus laws as the keys to the ethical design of intelligent machines
Hibbard, WL
[J]. 6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XIII, PROCEEDINGS: CONCEPTS AND APPLICATIONS OF SYSTEMICS, CYBERNETICS AND INFORMATICS III, 2002, : 469 - 472
[44] PERCEPTION OF NONSENSE PASSAGES IN RELATION TO AMOUNT OF INFORMATION AND SPEECH-TO-NOISE RATIO
MILLER, I
[J]. JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 1957, 53 (06): : 388 - 393
[45] Exploring the Effect of Motion Type and Emotions on the Perception of Gender in Virtual Humans
Zibrek, Katja
Hoyet, Ludovic
Ruhland, Kerstin
McDonnell, Rachel
[J]. ACM TRANSACTIONS ON APPLIED PERCEPTION, 2015, 12 (03)
[46] Evaluating Speech Perception of the MAXUM Middle Ear Implant Versus Speech Perception Under Inserts
Dyer, R. Kent
Spearman, Michael
Spearman, Brian
McCraney, Anna
[J]. LARYNGOSCOPE, 2018, 128 (02): : 456 - 460
[47] Online Ternary Classification of Covert Speech by Leveraging the Passive Perception of Speech
Moon, Jae
Chau, Tom
[J]. INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2023,
[48] ASYMMETRIES BETWEEN THE PERCEPTION VERSUS PRODUCTION OF SPEECH
MACKAY, DG
[J]. BULLETIN OF THE PSYCHONOMIC SOCIETY, 1986, 24 (05) : 344 - 344
[49] The impact of allophony versus contrast on speech perception
Boomershine, Amanda
Hall, Kathleen Currie
Hume, Elizabeth
Johnson, Keith
[J]. CONTRAST IN PHONOLOGY: THEORY, PERCEPTION, ACQUISITION, 2008, 13 : 145 - 171
[50] Prosody and Semantics Are Separate but Not Separable Channels in the Perception of Emotional Speech: Test for Rating of Emotions in Speech
Ben-David, Boaz M.
Multani, Namita
Shakuf, Vered
Rudzicz, Frank
van Lieshout, Pascal H. H. M.
[J]. JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2016, 59 (01): : 72 - 89

← 1 2 3 4 5 →