Perception and classification of emotions in nonsense speech: Humans versus machines

被引:2
|
作者
Parada-Cabaleiro, Emilia [1 ,2 ,3 ]
Batliner, Anton [3 ]
Schmitt, Maximilian [3 ]
Schedl, Markus [1 ,2 ]
Costantini, Giovanni [4 ]
Schuller, Bjoern [3 ,5 ]
机构
[1] Johannes Kepler Univ Linz, Inst Computat Percept, Linz, Austria
[2] Linz Inst Technol LIT, Human Ctr Grp, Linz, Austria
[3] Univ Augsburg, Chair Embedded Intelligence Hlth Care & Wellbeing, Augsburg, Germany
[4] Univ Roma Tor Vergata, Dept Elect Engn, Rome, Italy
[5] Imperial Coll London, GLAM Grp Language,Audio & Mus, London, England
来源
PLOS ONE | 2023年 / 18卷 / 01期
基金
奥地利科学基金会; 欧盟地平线“2020”;
关键词
RECOGNITION; PROSODY; PITCH;
D O I
10.1371/journal.pone.0281079
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
This article contributes to a more adequate modelling of emotions encoded in speech, by addressing four fallacies prevalent in traditional affective computing: First, studies concentrate on few emotions and disregard all other ones ('closed world'). Second, studies use clean (lab) data or real-life ones but do not compare clean and noisy data in a comparable setting ('clean world'). Third, machine learning approaches need large amounts of data; however, their performance has not yet been assessed by systematically comparing different approaches and different sizes of databases ('small world'). Fourth, although human annotations of emotion constitute the basis for automatic classification, human perception and machine classification have not yet been compared on a strict basis ('one world'). Finally, we deal with the intrinsic ambiguities of emotions by interpreting the confusions between categories ('fuzzy world'). We use acted nonsense speech from the GEMEP corpus, emotional 'distractors' as categories not entailed in the test set, real-life noises that mask the clear recordings, and different sizes of the training set for machine learning. We show that machine learning based on state-of-the-art feature representations (wav2vec2) is able to mirror the main emotional categories ('pillars') present in perceptual emotional constellations even in degradated acoustic conditions.
引用
收藏
页数:26
相关论文
共 50 条
  • [31] English Conversational Telephone Speech Recognition by Humans and Machines
    Saon, George
    Kurata, Gakuto
    Sercu, Tom
    Audhkhasi, Kartik
    Thomas, Samuel
    Dimitriadis, Dimitrios
    Cui, Xiaodong
    Ramabhadran, Bhuvana
    Picheny, Michael
    Lim, Lynn-Li
    Roomi, Bergul
    Hall, Phil
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 132 - 136
  • [32] ENGLISH BROADCAST NEWS SPEECH RECOGNITION BY HUMANS AND MACHINES
    Thomas, Samuel
    Suzuki, Masayuki
    Huang, Yinghui
    Kurata, Gakuto
    Tuske, Zoltan
    Saon, George
    Kingsbury, Brian
    Picheny, Michael
    Dibert, Tom
    Kaiser-Schatzlein, Alice
    Samko, Bern
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6455 - 6459
  • [33] HUMANS VERSUS MACHINES OR HUMANS AND MACHINES? STEM AND CITIZENSHIP EDUCATION IN A POST-HUMAN SOCIETY
    Simoes, N.
    Domingues, N.
    [J]. 12TH INTERNATIONAL TECHNOLOGY, EDUCATION AND DEVELOPMENT CONFERENCE (INTED), 2018, : 8255 - 8263
  • [34] Auditory implicit learning in machines versus humans
    Qian, Lei
    Larson, Kathleen G.
    Zelnio, Edmund
    Warren, Rik
    Bush, Bradley
    Garcia, Lukas
    Kulkarni, Trisha
    Latiff, Susan
    [J]. SIGNAL PROCESSING, SENSOR/INFORMATION FUSION, AND TARGET RECOGNITION XXIX, 2020, 11423
  • [35] Classification of Emotions from Speech using Implicit Features
    Srivastava, Mohit
    Agarwal, Anupam
    [J]. 2014 9TH INTERNATIONAL CONFERENCE ON INDUSTRIAL AND INFORMATION SYSTEMS (ICIIS), 2014, : 266 - 271
  • [36] Speech Prosody Extraction for Ibibio Emotions Analysis and Classification
    Ekpenyong, Moses E.
    Ananga, Aniekan J.
    Udoh, EmemObong O.
    Umoh, Nnamso M.
    [J]. HUMAN LANGUAGE TECHNOLOGY: CHALLENGES FOR COMPUTER SCIENCE AND LINGUISTICS, LTC 2019, 2022, 13212 : 197 - 213
  • [37] A comparison and classification of oscillatory characteristics in speech perception and covert speech
    Moon, Jae
    Orlandi, Silvia
    Chau, Tom
    [J]. BRAIN RESEARCH, 2022, 1781
  • [38] Bird Speech Perception and Vocal Production: A Comparison with Humans
    Beckers, Gabriel J. L.
    [J]. HUMAN BIOLOGY, 2011, 83 (02) : 191 - 212
  • [39] Soundmorphing: A new approach to studying speech perception in humans
    Specht, K
    Rimol, LM
    Reul, J
    Hugdahl, K
    [J]. NEUROSCIENCE LETTERS, 2005, 384 (1-2) : 60 - 65
  • [40] PATTERN-RECOGNITION BY HUMANS AND MACHINES, VOL 1, SPEECH-PERCEPTION, VOL 2, VISUAL-PERCEPTION - SCHWAB,EC, NUSBAUM,HC
    WATT, RJ
    [J]. PERCEPTION, 1988, 17 (02) : 279 - 282