Multimodal Local-Global Attention Network for Affective Video Content Analysis

被引:37
|
作者
Ou, Yangjun [1 ]
Chen, Zhenzhong [1 ]
Wu, Feng [2 ]
机构
[1] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan 430079, Peoples R China
[2] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei 230027, Peoples R China
关键词
Visualization; Task analysis; Psychology; Feature extraction; Hidden Markov models; Analytical models; Brain modeling; Affective content analysis; multimodal learning; attention; EMOTION RECOGNITION; MODEL; REPRESENTATION; INTEGRATION; DATABASE;
D O I
10.1109/TCSVT.2020.3014889
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With the rapid development of video distribution and broadcasting, affective video content analysis has attracted a lot of research and development activities recently. Predicting emotional responses of movie audiences is a challenging task in affective computing, since the induced emotions can be considered relatively subjective. In this article, we propose a multimodal local-global attention network (MMLGAN) for affective video content analysis. Inspired by the multimodal integration effect, we extend the attention mechanism to multi-level fusion and design a multimodal fusion unit to obtain a global representation of affective video. The multimodal fusion unit selects key parts from multimodal local streams in the local attention stage and captures the information distribution across time in the global attention stage. Experiments on the LIRIS-ACCEDE dataset, the MediaEval 2015 and 2016 datasets, the FilmStim dataset, the DEAP dataset and the VideoEmotion dataset demonstrate the effectiveness of our approach when compared with the state-of-the-art methods.
引用
收藏
页码:1901 / 1914
页数:14
相关论文
共 50 条
  • [41] A Local-Global Attention Fusion Framework with Tensor Decomposition for Medical Diagnosis
    Wu, Peishu
    Li, Han
    Hu, Liwei
    Ge, Jirong
    Zeng, Nianyin
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2024, 11 (06) : 1536 - 1538
  • [42] Video captioning with global and local text attention
    Yuqing Peng
    Chenxi Wang
    Yixin Pei
    Yingjun Li
    The Visual Computer, 2022, 38 : 4267 - 4278
  • [43] Video captioning with global and local text attention
    Peng, Yuqing
    Wang, Chenxi
    Pei, Yixin
    Li, Yingjun
    VISUAL COMPUTER, 2022, 38 (12): : 4267 - 4278
  • [44] Simultaneous Facial Age Group and Gender Recognition using Efficient Local-Global Attention Network for Intelligent Advertising
    Priadana, Adri
    Duy-Linh Nguyen
    Xuan-Thuy Vo
    Putro, Muhamad Dwisnanto
    Cao, Ge
    Jo, Kanghyun
    2024 33RD INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS, ISIE 2024, 2024,
  • [45] LGANET: LOCAL-GLOBAL AUGMENTATION NETWORK FOR SKIN LESION SEGMENTATION
    Guo, Qingqing
    Fang, Xianyong
    Wang, Linbo
    Zhang, Enming
    Liu, Zhengyi
    2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI, 2023,
  • [46] Local-Global Transformer Neural Network for temporal action segmentation
    Tian, Xiaoyan
    Jin, Ye
    Tang, Xianglong
    MULTIMEDIA SYSTEMS, 2023, 29 (02) : 615 - 626
  • [47] Local-global feature fusion network for hyperspectral image classification
    Gan, Yuquan
    Zhang, Hao
    Liu, Weihua
    Ma, Jieming
    Luo, Yiming
    Pan, Yushan
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2024, : 8548 - 8575
  • [48] Unified multi-stage fusion network for affective video content analysis
    Yi, Yun
    Wang, Hanli
    Tang, Pengjie
    ELECTRONICS LETTERS, 2022, 58 (21) : 795 - 797
  • [49] Dual-Domain Dynamic Local-Global Network for Pansharpening
    Wang, Zeping
    Hu, Jianwen
    Feng, Xi
    Kang, Xudong
    Mo, Yan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [50] Divide, Conquer and Combine: Hierarchical Feature Fusion Network with Local and Global Perspectives for Multimodal Affective Computing
    Mai, Sijie
    Hu, Haifeng
    Xing, Songlong
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 481 - 492