Joint Learning for Relationship and Interaction Analysis in Video with Multimodal Feature Fusion

被引：7

作者：

Zhang, Beibei ^{[1
]}

Yu, Fan ^{[1
,2
]}

Gao, Yanxin ^{[1
]}

Ren, Tongwei ^{[1
,2
]}

Wu, Gangshan ^{[1
]}

机构：

[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China

[2] Nanjing Univ, Shenzhen Res Inst, Shenzhen, Peoples R China

来源：

PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021 | 2021年

基金：

美国国家科学基金会;

关键词：

Deep video understanding; relationship analysis; interaction analysis; multimodal feature fusion;

D O I：

10.1145/3474085.3479214

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

To comprehend long duration videos, the deep video understanding (DVU) task is proposed to recognize interactions on scene level and relationships on movie level and answer questions on these two levels. In this paper, we propose a solution to the DVU task which applies joint learning of interaction and relationship prediction and multimodal feature fusion. Our solution handles the DVU task with three joint learning sub-tasks: scene sentiment classification, scene interaction recognition and super-scene video relationship recognition, all of which utilize text features, visual features and audio features, and predict representations in semantic space. Since sentiment, interaction and relationship are related to each other, we train a unified framework with joint learning. Then, we answer questions for video analysis in DVU according to the results of the three sub-tasks. We conduct experiments on the HLVU dataset to evaluate the effectiveness of our method.

引用

页码：4848 / 4852

页数：5

共 50 条

[31] Multimodal Machine Learning for Video and Image Analysis
Ghosh, Shalini
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 3608 - 3608
[32] Multimodal interaction enhanced representation learning for video emotion recognition
Xia, Xiaohan
Zhao, Yong
Jiang, Dongmei
FRONTIERS IN NEUROSCIENCE, 2022, 16
[33] Multimodal feature fusion in deep learning for comprehensive dental condition classification
Hsieh, Shang-Ting
Cheng, Ya-Ai
JOURNAL OF X-RAY SCIENCE AND TECHNOLOGY, 2024, 32 (02) : 303 - 321
[34] HIERARCHICAL MULTI-FEATURE FUSION FOR MULTIMODAL DATA ANALYSIS
Zhang, Hong
Chen, Li
Liu, Jun
Yuan, Junsong
2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 5916 - 5920
[35] Sentiment Analysis of Social Media via Multimodal Feature Fusion
Zhang, Kang
Geng, Yushui
Zhao, Jing
Liu, Jianxin
Li, Wenxiao
SYMMETRY-BASEL, 2020, 12 (12): : 1 - 14
[36] Character emotion recognition algorithm in small sample video based on multimodal feature fusion
Xie, Jian
Chu, Dan
INTERNATIONAL JOURNAL OF BIOMETRICS, 2025, 17 (1-2) : 1 - 14
[37] Quantum-inspired multimodal fusion for video sentiment analysis
Li, Qiuchi
Gkoumas, Dimitris
Lioma, Christina
Melucci, Massimo
INFORMATION FUSION, 2021, 65 : 58 - 71
[38] Multimodal sentiment analysis based on multi-layer feature fusion and multi-task learning
Cai, Yujian
Li, Xingguang
Zhang, Yingyu
Li, Jinsong
Zhu, Fazheng
Rao, Lin
SCIENTIFIC REPORTS, 2025, 15 (01):
[39] Research on the Video Semantic Analysis Framework based on Multiple Feature Fusion and Deep Learning Structure
Liang, Rui
Zhu, Qingxin
2016 2ND INTERNATIONAL CONFERENCE ON SOCIAL SCIENCE, MANAGEMENT AND ECONOMICS (SSME 2016), 2016, : 727 - 733
[40] MFD-GDrug: multimodal feature fusion-based deep learning for GPCR-drug interaction prediction
Gu, Xingyue
Liu, Junkai
Yu, Yue
Xiao, Pengfeng
Ding, Yijie
METHODS, 2024, 223 : 75 - 82

← 1 2 3 4 5 →