Joint Learning for Relationship and Interaction Analysis in Video with Multimodal Feature Fusion

被引:7
|
作者
Zhang, Beibei [1 ]
Yu, Fan [1 ,2 ]
Gao, Yanxin [1 ]
Ren, Tongwei [1 ,2 ]
Wu, Gangshan [1 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
[2] Nanjing Univ, Shenzhen Res Inst, Shenzhen, Peoples R China
基金
美国国家科学基金会;
关键词
Deep video understanding; relationship analysis; interaction analysis; multimodal feature fusion;
D O I
10.1145/3474085.3479214
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To comprehend long duration videos, the deep video understanding (DVU) task is proposed to recognize interactions on scene level and relationships on movie level and answer questions on these two levels. In this paper, we propose a solution to the DVU task which applies joint learning of interaction and relationship prediction and multimodal feature fusion. Our solution handles the DVU task with three joint learning sub-tasks: scene sentiment classification, scene interaction recognition and super-scene video relationship recognition, all of which utilize text features, visual features and audio features, and predict representations in semantic space. Since sentiment, interaction and relationship are related to each other, we train a unified framework with joint learning. Then, we answer questions for video analysis in DVU according to the results of the three sub-tasks. We conduct experiments on the HLVU dataset to evaluate the effectiveness of our method.
引用
收藏
页码:4848 / 4852
页数:5
相关论文
共 50 条
  • [41] Exploration of Deep Semantic Analysis and Application of Video Images in Visual Communication Design Based on Multimodal Feature Fusion Algorithm
    Chen, Yanlin
    Chen, Xiwen
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (08) : 1051 - 1061
  • [42] A Multimodal Framework for Unsupervised Feature Fusion
    Li, Xiaoyi
    Gao, Jing
    Li, Hui
    Yang, Le
    Srihari, Rohini K.
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 897 - 902
  • [43] Multimodal feature fusion for concreteness estimation
    Incitti, Francesca
    Snidaro, Lauro
    2022 25TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION 2022), 2022,
  • [44] Multiview Multimodal Feature Fusion for Breast Cancer Classification Using Deep Learning
    Hussain, Sadam
    Teevno, Mansoor Ali
    Naseem, Usman
    Avalos, Daly Betzabeth Avendano
    Cardona-Huerta, Servando
    Tamez-Pena, Jose Gerardo
    IEEE ACCESS, 2025, 13 : 9265 - 9275
  • [45] Speech emotion recognition using multimodal feature fusion with machine learning approach
    Sandeep Kumar Panda
    Ajay Kumar Jena
    Mohit Ranjan Panda
    Susmita Panda
    Multimedia Tools and Applications, 2023, 82 : 42763 - 42781
  • [46] Deep learning and multimodal feature fusion for the aided diagnosis of Alzheimer's disease
    Jia, Hongfei
    Lao, Huan
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (22): : 19585 - 19598
  • [47] Speech emotion recognition using multimodal feature fusion with machine learning approach
    Panda, Sandeep Kumar
    Jena, Ajay Kumar
    Panda, Mohit Ranjan
    Panda, Susmita
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (27) : 42763 - 42781
  • [48] Learning Deep Multimodal Feature Representation with Asymmetric Multi-layer Fusion
    Wang, Yikai
    Sun, Fuchun
    Lu, Ming
    Yao, Anbang
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3902 - 3910
  • [49] Deep learning and multimodal feature fusion for the aided diagnosis of Alzheimer's disease
    Hongfei Jia
    Huan Lao
    Neural Computing and Applications, 2022, 34 : 19585 - 19598
  • [50] Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis
    Suk, Heung-Il
    Lee, Seong-Whan
    Shen, Dinggang
    NEUROIMAGE, 2014, 101 : 569 - 582