Joint Learning for Relationship and Interaction Analysis in Video with Multimodal Feature Fusion

被引：7

作者：

Zhang, Beibei ^{[1
]}

Yu, Fan ^{[1
,2
]}

Gao, Yanxin ^{[1
]}

Ren, Tongwei ^{[1
,2
]}

Wu, Gangshan ^{[1
]}

机构：

[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China

[2] Nanjing Univ, Shenzhen Res Inst, Shenzhen, Peoples R China

来源：

PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021 | 2021年

基金：

美国国家科学基金会;

关键词：

Deep video understanding; relationship analysis; interaction analysis; multimodal feature fusion;

D O I：

10.1145/3474085.3479214

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

To comprehend long duration videos, the deep video understanding (DVU) task is proposed to recognize interactions on scene level and relationships on movie level and answer questions on these two levels. In this paper, we propose a solution to the DVU task which applies joint learning of interaction and relationship prediction and multimodal feature fusion. Our solution handles the DVU task with three joint learning sub-tasks: scene sentiment classification, scene interaction recognition and super-scene video relationship recognition, all of which utilize text features, visual features and audio features, and predict representations in semantic space. Since sentiment, interaction and relationship are related to each other, we train a unified framework with joint learning. Then, we answer questions for video analysis in DVU according to the results of the three sub-tasks. We conduct experiments on the HLVU dataset to evaluate the effectiveness of our method.

引用

页码：4848 / 4852

页数：5

共 50 条

[41] Exploration of Deep Semantic Analysis and Application of Video Images in Visual Communication Design Based on Multimodal Feature Fusion Algorithm
Chen, Yanlin
Chen, Xiwen
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (08) : 1051 - 1061
[42] A Multimodal Framework for Unsupervised Feature Fusion
Li, Xiaoyi
Gao, Jing
Li, Hui
Yang, Le
Srihari, Rohini K.
PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 897 - 902
[43] Multimodal feature fusion for concreteness estimation
Incitti, Francesca
Snidaro, Lauro
2022 25TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION 2022), 2022,
[44] Multiview Multimodal Feature Fusion for Breast Cancer Classification Using Deep Learning
Hussain, Sadam
Teevno, Mansoor Ali
Naseem, Usman
Avalos, Daly Betzabeth Avendano
Cardona-Huerta, Servando
Tamez-Pena, Jose Gerardo
IEEE ACCESS, 2025, 13 : 9265 - 9275
[45] Speech emotion recognition using multimodal feature fusion with machine learning approach
Sandeep Kumar Panda
Ajay Kumar Jena
Mohit Ranjan Panda
Susmita Panda
Multimedia Tools and Applications, 2023, 82 : 42763 - 42781
[46] Deep learning and multimodal feature fusion for the aided diagnosis of Alzheimer's disease
Jia, Hongfei
Lao, Huan
NEURAL COMPUTING & APPLICATIONS, 2022, 34 (22): : 19585 - 19598
[47] Speech emotion recognition using multimodal feature fusion with machine learning approach
Panda, Sandeep Kumar
Jena, Ajay Kumar
Panda, Mohit Ranjan
Panda, Susmita
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (27) : 42763 - 42781
[48] Learning Deep Multimodal Feature Representation with Asymmetric Multi-layer Fusion
Wang, Yikai
Sun, Fuchun
Lu, Ming
Yao, Anbang
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3902 - 3910
[49] Deep learning and multimodal feature fusion for the aided diagnosis of Alzheimer's disease
Hongfei Jia
Huan Lao
Neural Computing and Applications, 2022, 34 : 19585 - 19598
[50] Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis
Suk, Heung-Il
Lee, Seong-Whan
Shen, Dinggang
NEUROIMAGE, 2014, 101 : 569 - 582

← 1 2 3 4 5 →