Recognition of Emotions in User-Generated Videos through Frame-Level Adaptation and Emotion Intensity Learning

被引:1
|
作者
Zhang, Haimin [1 ]
Xu, Min [1 ]
机构
[1] Univ Technol Sydney, Sch Elect & Data Engn, Ultimo, NSW 2007, Australia
关键词
Videos; Feature extraction; Emotion recognition; Task analysis; Computer architecture; Semantics; Adaptation models; Adversarial domain adaptation; emotion intensity learning; video emotion recognition;
D O I
10.1109/TMM.2021.3134167
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recognition of emotions in user-generated videos has attracted considerable research attention. Most existing approaches focus on learning frame-level features and fail to consider frame-level emotion intensities which are critical for video representation. In this research, we aim to extract frame-level features and emotion intensities through transferring emotional information from an image emotion dataset. To achieve this goal, we propose an end-to-end network for joint emotion recognition and intensity learning with unsupervised adversarial adaptation. The proposed network consists of a classification stream, an intensity learning stream and an adversarial adaptation module. The classification stream is used to generate pseudo intensity maps with the class activation mapping method to train the intensity learning subnetwork. The intensity learning stream is built upon an improved feature pyramid network in which features from different scales are cross-connected. The adversarial adaptation module is employed to reduce the domain difference between the source dataset and target video frames. By aligning cross domain features, we enable our network to learn on the source data while generalizing to video frames. Finally, we apply a weighted sum pooling method to frame-level features and emotion intensities to generate video-level features. We evaluate the proposed method on two benchmark datasets, i.e., VideoEmotion-8 and Ekman-6. The experimental results show that the proposed method achieves improved performance compared to previous state-of-the-art methods.
引用
收藏
页码:881 / 891
页数:11
相关论文
共 24 条
  • [1] Recognition of Emotions in User-Generated Videos With Kernelized Features
    Zhang, Haimin
    Xu, Min
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (10) : 2824 - 2835
  • [2] Predicting Emotions in User-Generated Videos
    Jiang, Yu-Gang
    Xu, Baohan
    Xue, Xiangyang
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2014, : 73 - 79
  • [3] Emotion Prediction from User-Generated Videos by Emotion Wheel Guided Deep Learning
    Ho, Che-Ting
    Lin, Yu-Hsun
    Wu, Ja-Ling
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2016, PT I, 2016, 9947 : 3 - 12
  • [4] Emotion recognition in user-generated videos with long-range correlation-aware network
    Yi, Yun
    Zhou, Jin
    Wang, Hanli
    Tang, Pengjie
    Wang, Min
    [J]. IET IMAGE PROCESSING, 2024,
  • [5] MULTITASK LEARNING FOR FRAME-LEVEL INSTRUMENT RECOGNITION
    Hung, Yun-Ning
    Chen, Yi-An
    Yang, Yi-Hsuan
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 381 - 385
  • [6] Frame-Level Teacher-Student Learning With Data Privacy for EEG Emotion Recognition
    Gu, Tianhao
    Wang, Zhe
    Xu, Xinlei
    Li, Dongdong
    Yang, Hai
    Du, Wenli
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) : 11021 - 11028
  • [7] An End-to-End Visual-Audio Attention Network for Emotion Recognition in User-Generated Videos
    Zhao, Sicheng
    Ma, Yunsheng
    Gu, Yang
    Yang, Jufeng
    Xing, Tengfei
    Xu, Pengfei
    Hu, Runbo
    Chai, Hua
    Keutzer, Kurt
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 303 - 311
  • [8] User-generated video emotion recognition based on key frames
    Jie Wei
    Xinyu Yang
    Yizhuo Dong
    [J]. Multimedia Tools and Applications, 2021, 80 : 14343 - 14361
  • [9] User-generated video emotion recognition based on key frames
    Wei, Jie
    Yang, Xinyu
    Dong, Yizhuo
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (09) : 14343 - 14361
  • [10] FLDNet: Frame-Level Distilling Neural Network for EEG Emotion Recognition
    Wang, Zhe
    Gu, Tianhao
    Zhu, Yiwen
    Li, Dongdong
    Yang, Hai
    Du, Wenli
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2021, 25 (07) : 2533 - 2544