Entity Dependency Learning Network With Relation Prediction for Video Visual Relation Detection

被引:0
|
作者
Zhang, Guoguang [1 ]
Tang, Yepeng [1 ]
Zhang, Chunjie [1 ]
Zheng, Xiaolong [2 ,3 ,4 ]
Zhao, Yao [1 ]
机构
[1] Beijing Jiaotong Univ, Sch Comp Sci & Technol, Beijing Key Lab Adv Informat Sci & Network Technol, Beijing 100044, Peoples R China
[2] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence S, Beijing 100190, Peoples R China
[3] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China
[4] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Trajectory; Visualization; Task analysis; Object detection; Encoding; Decoding; Visual relation detection; entity dependency learning; video understanding; GENERATION;
D O I
10.1109/TCSVT.2024.3437437
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Video Visual Relation Detection (VidVRD) is a pivotal task in the field of video analysis. It involves detecting object trajectories in videos, predicting potential dynamic relation between these trajectories, and ultimately representing these relationships in the form of <subject, predicate, object> triplets. Correct prediction of relation is vital for VidVRD. Existing methods mostly adopt the simple fusion of visual and language features of entity trajectories as the feature representation for relation predicates. However, these methods do not take into account the dependency information between the relation predication and the subject and object within the triplet. To address this issue, we propose the entity dependency learning network(EDLN), which can capture the dependency information between relation predicates and subjects, objects, and subject-object pairs. It adaptively integrates these dependency information into the feature representation of relation predicates. Additionally, to effectively model the features of the relation existing between various object entities pairs, in the context encoding phase for relation predicate features, we introduce a fully convolutional encoding approach as a substitute for the self-attention mechanism in the Transformer. Extensive experiments on two public datasets demonstrate the effectiveness of the proposed EDLN.
引用
收藏
页码:12425 / 12436
页数:12
相关论文
共 50 条
  • [1] Video Visual Relation Detection
    Shang, Xindi
    Ren, Tongwei
    Guo, Jingfan
    Zhang, Hanwang
    Chua, Tat-Seng
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1300 - 1308
  • [2] Attention Guided Relation Detection Approach for Video Visual Relation Detection
    Cao, Qianwen
    Huang, Heyan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 3896 - 3907
  • [3] VSRN: Visual-Semantic Relation Network for Video Visual Relation Inference
    Cao, Qianwen
    Huang, Heyan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (02) : 768 - 777
  • [4] Illation of Video Visual Relation Detection Based on Graph Neural Network
    Qu, MingCheng
    Cui, JianXun
    Nie, Yuxi
    Su, TongHua
    IEEE ACCESS, 2021, 9 : 141144 - 141153
  • [5] Concept-Enhanced Relation Network for Video Visual Relation Inference
    Cao, Qianwen
    Huang, Heyan
    Ren, Mucheng
    Yuan, Changsen
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (05) : 2233 - 2244
  • [6] A neural network framework for relation extraction: Learning entity semantic and relation pattern
    Zheng, Suncong
    Xu, Jiaming
    Zhou, Peng
    Bao, Hongyun
    Qi, Zhenyu
    Xu, Bo
    KNOWLEDGE-BASED SYSTEMS, 2016, 114 : 12 - 23
  • [7] Visual Translation Embedding Network for Visual Relation Detection
    Zhang, Hanwang
    Kyaw, Zawlin
    Chang, Shih-Fu
    Chua, Tat-Seng
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3107 - 3115
  • [8] Attribute-Driven Capsule Network for Entity Relation Prediction
    Chen, Jiayin
    Gong, Xiaolong
    Chen, Xi
    Ma, Zhiyi
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT I, 2020, 12084 : 675 - 686
  • [9] Video Visual Relation Detection With Contextual Knowledge Embedding
    Cao, Qianwen
    Huang, Heyan
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (12) : 13083 - 13095
  • [10] Video Visual Relation Detection via Iterative Inference
    Shang, Xindi
    Li, Yicong
    Xiao, Junbin
    Ji, Wei
    Chua, Tat-Seng
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3654 - 3663