Multimodal Local-Global Attention Network for Affective Video Content Analysis

被引:37
|
作者
Ou, Yangjun [1 ]
Chen, Zhenzhong [1 ]
Wu, Feng [2 ]
机构
[1] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan 430079, Peoples R China
[2] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei 230027, Peoples R China
关键词
Visualization; Task analysis; Psychology; Feature extraction; Hidden Markov models; Analytical models; Brain modeling; Affective content analysis; multimodal learning; attention; EMOTION RECOGNITION; MODEL; REPRESENTATION; INTEGRATION; DATABASE;
D O I
10.1109/TCSVT.2020.3014889
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With the rapid development of video distribution and broadcasting, affective video content analysis has attracted a lot of research and development activities recently. Predicting emotional responses of movie audiences is a challenging task in affective computing, since the induced emotions can be considered relatively subjective. In this article, we propose a multimodal local-global attention network (MMLGAN) for affective video content analysis. Inspired by the multimodal integration effect, we extend the attention mechanism to multi-level fusion and design a multimodal fusion unit to obtain a global representation of affective video. The multimodal fusion unit selects key parts from multimodal local streams in the local attention stage and captures the information distribution across time in the global attention stage. Experiments on the LIRIS-ACCEDE dataset, the MediaEval 2015 and 2016 datasets, the FilmStim dataset, the DEAP dataset and the VideoEmotion dataset demonstrate the effectiveness of our approach when compared with the state-of-the-art methods.
引用
收藏
页码:1901 / 1914
页数:14
相关论文
共 50 条
  • [11] Learning a Local-Global Alignment Network for Satellite Video Super-Resolution
    Jin, Xianyu
    He, Jiang
    Xiao, Yi
    Yuan, Qiangqiang
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [12] Multimodal Local-Global Ranking Fusion for Emotion Recognition
    Liang, Paul Pu
    Zadeh, Amir
    Morency, Louis-Philippe
    ICMI'18: PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2018, : 472 - 476
  • [13] A multi-focus image fusion network with local-global joint attention module
    Zou, Xinheng
    Yang, You
    Zhai, Hao
    Jiang, Weiping
    Pan, Xin
    Applied Intelligence, 2025, 55 (02)
  • [14] Hyperspectral Image Super-Resolution Network of Local-Global Attention Feature Reuse
    Size, Wang
    Xin, Guan
    Qiang, Li
    ACTA OPTICA SINICA, 2023, 43 (21)
  • [15] MAFPN: a mixed local-global attention feature pyramid network for aerial object detection
    Ma, Tengfei
    Yin, Haitao
    REMOTE SENSING LETTERS, 2024, 15 (09) : 907 - 918
  • [16] Local-global visual interaction attention for image captioning
    Wang, Changzhi
    Gu, Xiaodong
    DIGITAL SIGNAL PROCESSING, 2022, 130
  • [17] Video Diffusion Models with Local-Global Context Guidance
    Yang, Siyuan
    Zhang, Lu
    Liu, Yu
    Jiang, Zhizhuo
    He, You
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 1640 - 1648
  • [18] Affective video content analysis based on multimodal data fusion in heterogeneous networks
    Guo, Jie
    Song, Bin
    Zhang, Peng
    Ma, Mengdi
    Luo, Wenwen
    Junmei
    INFORMATION FUSION, 2019, 51 : 224 - 232
  • [19] Affective Video Content Analysis With Adaptive Fusion Recurrent Network
    Yi, Yun
    Wang, Hanli
    Li, Qinyu
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (09) : 2454 - 2466
  • [20] A multi-focus image fusion network with local-global joint attention moduleA Multi-focus image fusion network with local-global joint attention moduleX. Zou et al.
    Xinheng Zou
    You Yang
    Hao Zhai
    Weiping Jiang
    Xin Pan
    Applied Intelligence, 2025, 55 (2)