ISLA: Temporal Segmentation and Labeling for Audio-Visual Emotion Recognition

被引:27
|
作者
Kim, Yelin [1 ]
Provost, Emily Mower [2 ]
机构
[1] SUNY Albany, Dept Elect & Comp Engn, Albany, NY 12206 USA
[2] Univ Michigan, Dept Elect Engn & Comp Sci, Ann Arbor, MI 48109 USA
关键词
Audio-visual; emotion; recognition; multimodal; temporal; face region; speech; FACIAL EXPRESSION; SPEECH; CLASSIFICATION; MODALITIES; MOVEMENT; PROSODY; AREAS;
D O I
10.1109/TAFFC.2017.2702653
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotion is an essential part of human interaction. Automatic emotion recognition can greatly benefit human-centered interactive technology, since extracted emotion can be used to understand and respond to user needs. However, real-world emotion recognition faces a central challenge when a user is speaking: facial movements due to speech are often confused with facial movements related to emotion. Recent studies have found that the use of phonetic information can reduce speech-related variability in the lower face region. However, methods to differentiate upper face movements due to emotion and due to speech have been underexplored. This gap leads us to the proposal of the Informed Segmentation and Labeling Approach (ISLA). ISLA uses speech signals that alter the dynamics of the lower and upper face regions. We demonstrate how pitch can be used to improve estimates of emotion from the upper face, and how this estimate can be combined with emotion estimates from the lower face and speech in a multimodal classification system. Our emotion classification results on the IEMOCAP and SAVEE datasets show that ISLA improves overall classification performance. We also demonstrate how emotion estimates from different modalities correlate with each other, providing insights into the differences between posed and spontaneous expressions.
引用
收藏
页码:196 / 208
页数:13
相关论文
共 50 条
  • [31] Audio-Visual Domain Adaptation Feature Fusion for Speech Emotion Recognition
    Wei, Jie
    Hu, Guanyu
    Yang, Xinyu
    Luu, Anh Tuan
    Dong, Yizhuo
    INTERSPEECH 2022, 2022, : 1988 - 1992
  • [32] Audio-Visual Emotion Recognition Based on Facial Expression and Affective Speech
    Zhang, Shiqing
    Li, Lemin
    Zhao, Zhijin
    MULTIMEDIA AND SIGNAL PROCESSING, 2012, 346 : 46 - +
  • [33] Leveraging Inter-rater Agreement for Audio-Visual Emotion Recognition
    Kim, Yelin
    Provost, Emily Mower
    2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2015, : 553 - 559
  • [34] Metric Learning-Based Multimodal Audio-Visual Emotion Recognition
    Ghaleb, Esam
    Popa, Mirela
    Asteriadis, Stylianos
    IEEE MULTIMEDIA, 2020, 27 (01) : 37 - 48
  • [35] Audio-Visual Emotion Recognition Based on a DBN Model with Constrained Asynchrony
    Chen, Danqi
    Jiang, Dongmei
    Ravyse, Ilse
    Sahli, Hichem
    PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON IMAGE AND GRAPHICS (ICIG 2009), 2009, : 912 - 916
  • [36] Audio-Visual Fusion Network Based on Conformer for Multimodal Emotion Recognition
    Guo, Peini
    Chen, Zhengyan
    Li, Yidi
    Liu, Hong
    ARTIFICIAL INTELLIGENCE, CICAI 2022, PT II, 2022, 13605 : 315 - 326
  • [37] Leveraging recent advances in deep learning for audio-Visual emotion recognition
    Schoneveld, Liam
    Othmani, Alice
    Abdelkawy, Hazem
    PATTERN RECOGNITION LETTERS, 2021, 146 : 1 - 7
  • [38] Learning Better Representations for Audio-Visual Emotion Recognition with Common Information
    Ma, Fei
    Zhang, Wei
    Li, Yang
    Huang, Shao-Lun
    Zhang, Lin
    APPLIED SCIENCES-BASEL, 2020, 10 (20): : 1 - 23
  • [39] Audio-visual affect recognition
    Zeng, Zhihong
    Tu, Jilin
    Liu, Ming
    Huang, Thomas S.
    Pianfetti, Brian
    Roth, Dan
    Levinson, Stephen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2007, 9 (02) : 424 - 428
  • [40] Audio-visual integration of emotion expression
    Collignon, Olivier
    Girard, Simon
    Gosselin, Frederic
    Roy, Sylvain
    Saint-Amour, Dave
    Lassonde, Maryse
    Lepore, Franco
    BRAIN RESEARCH, 2008, 1242 : 126 - 135