Entity Dependency Learning Network With Relation Prediction for Video Visual Relation Detection

被引：0

作者：

Zhang, Guoguang ^{[1
]}

Tang, Yepeng ^{[1
]}

Zhang, Chunjie ^{[1
]}

Zheng, Xiaolong ^{[2
,3
,4
]}

Zhao, Yao ^{[1
]}

机构：

[1] Beijing Jiaotong Univ, Sch Comp Sci & Technol, Beijing Key Lab Adv Informat Sci & Network Technol, Beijing 100044, Peoples R China

[2] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence S, Beijing 100190, Peoples R China

[3] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China

[4] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100190, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Trajectory; Visualization; Task analysis; Object detection; Encoding; Decoding; Visual relation detection; entity dependency learning; video understanding; GENERATION;

D O I：

10.1109/TCSVT.2024.3437437

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Video Visual Relation Detection (VidVRD) is a pivotal task in the field of video analysis. It involves detecting object trajectories in videos, predicting potential dynamic relation between these trajectories, and ultimately representing these relationships in the form of <subject, predicate, object> triplets. Correct prediction of relation is vital for VidVRD. Existing methods mostly adopt the simple fusion of visual and language features of entity trajectories as the feature representation for relation predicates. However, these methods do not take into account the dependency information between the relation predication and the subject and object within the triplet. To address this issue, we propose the entity dependency learning network(EDLN), which can capture the dependency information between relation predicates and subjects, objects, and subject-object pairs. It adaptively integrates these dependency information into the feature representation of relation predicates. Additionally, to effectively model the features of the relation existing between various object entities pairs, in the context encoding phase for relation predicate features, we introduce a fully convolutional encoding approach as a substitute for the self-attention mechanism in the Transformer. Extensive experiments on two public datasets demonstrate the effectiveness of the proposed EDLN.

引用

页码：12425 / 12436

页数：12

共 50 条

[1] Video Visual Relation Detection
Shang, Xindi
Ren, Tongwei
Guo, Jingfan
Zhang, Hanwang
Chua, Tat-Seng
PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1300 - 1308
[2] Attention Guided Relation Detection Approach for Video Visual Relation Detection
Cao, Qianwen
Huang, Heyan
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 3896 - 3907
[3] VSRN: Visual-Semantic Relation Network for Video Visual Relation Inference
Cao, Qianwen
Huang, Heyan
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (02) : 768 - 777
[4] Illation of Video Visual Relation Detection Based on Graph Neural Network
Qu, MingCheng
Cui, JianXun
Nie, Yuxi
Su, TongHua
IEEE ACCESS, 2021, 9 : 141144 - 141153
[5] Concept-Enhanced Relation Network for Video Visual Relation Inference
Cao, Qianwen
Huang, Heyan
Ren, Mucheng
Yuan, Changsen
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (05) : 2233 - 2244
[6] A neural network framework for relation extraction: Learning entity semantic and relation pattern
Zheng, Suncong
Xu, Jiaming
Zhou, Peng
Bao, Hongyun
Qi, Zhenyu
Xu, Bo
KNOWLEDGE-BASED SYSTEMS, 2016, 114 : 12 - 23
[7] Visual Translation Embedding Network for Visual Relation Detection
Zhang, Hanwang
Kyaw, Zawlin
Chang, Shih-Fu
Chua, Tat-Seng
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3107 - 3115
[8] Attribute-Driven Capsule Network for Entity Relation Prediction
Chen, Jiayin
Gong, Xiaolong
Chen, Xi
Ma, Zhiyi
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT I, 2020, 12084 : 675 - 686
[9] Video Visual Relation Detection With Contextual Knowledge Embedding
Cao, Qianwen
Huang, Heyan
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (12) : 13083 - 13095
[10] Video Visual Relation Detection via Iterative Inference
Shang, Xindi
Li, Yicong
Xiao, Junbin
Ji, Wei
Chua, Tat-Seng
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3654 - 3663

← 1 2 3 4 5 →