Video Visual Relation Detection With Contextual Knowledge Embedding

被引:0
|
作者
Cao, Qianwen [1 ,2 ]
Huang, Heyan [1 ,2 ]
机构
[1] Beijing Inst Technol, Sch Comp, Beijing 100081, Peoples R China
[2] Beijing Engn Res Ctr High Volume Language Informat, Beijing 100081, Peoples R China
基金
中国国家自然科学基金;
关键词
Knowledge engineering; Computer vision; knowledge embedding; video understanding; video visual relation detection; visual relation tagging;
D O I
10.1109/TKDE.2023.3270328
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video visual relation detection (VidVRD) aims at abstracting structured relations in the form of $< $<subject-predicate-object$>$> from videos. The triple formation makes the search space extremely huge and the distribution unbalanced. Usually, existing works predict the relationships from visual, spatial, and semantic cues. Among them, semantic cues are responsible for exploring the semantic connections between objects, which is crucial to transfer knowledge across relations. However, most of these works extract semantic cues via simply mapping the object labels to classified features, which ignore the contextual surroundings, resulting in poor performance for low-frequency relations. To alleviate these issues, we propose a novel network, termed Contextual Knowledge Embedded Relation Network (CKERN), to facilitate VidVRD through establishing contextual knowledge embeddings for detected object pairs in relations from two aspects: commonsense attributes and prior linguistic dependencies. Specifically, we take the pair as a query to extract relational facts in the commonsense knowledge base, then encode them to explicitly construct semantic surroundings for relations. In addition, the statistics of object pairs with different predicates distilled from large-scale visual relations are taken into account to represent the linguistic regularity of relations. Extensive experimental results on benchmark datasets demonstrate the effectiveness and robustness of our proposed model.
引用
收藏
页码:13083 / 13095
页数:13
相关论文
共 50 条
  • [1] Localize, Assemble, and Predicate: Contextual Object Proposal Embedding for Visual Relation Detection
    Wu, Ruihai
    Xu, Kehan
    Liu, Chenchen
    Zhuang, Nan
    Mu, Yadong
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 12297 - 12304
  • [2] Video Visual Relation Detection
    Shang, Xindi
    Ren, Tongwei
    Guo, Jingfan
    Zhang, Hanwang
    Chua, Tat-Seng
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1300 - 1308
  • [3] Visual Translation Embedding Network for Visual Relation Detection
    Zhang, Hanwang
    Kyaw, Zawlin
    Chang, Shih-Fu
    Chua, Tat-Seng
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3107 - 3115
  • [4] Attention Guided Relation Detection Approach for Video Visual Relation Detection
    Cao, Qianwen
    Huang, Heyan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 3896 - 3907
  • [5] Contextual Translation Embedding for Visual Relationship Detection and Scene Graph Generation
    Hung, Zih-Siou
    Mallya, Arun
    Lazebnik, Svetlana
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (11) : 3820 - 3832
  • [6] Knowledge Base Error Detection with Relation Sensitive Embedding
    Kim, San
    Li, Xiuxing
    Li, Kaiyu
    Feng, Jianhua
    Huang, Yan
    Yang, Songfan
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2019), PT I, 2019, 11446 : 725 - 741
  • [7] Knowledge Embedding Relation Network for Small Data Defect Detection
    Ruan, Jinjia
    He, Jin
    Tong, Yao
    Wang, Yuchuan
    Fang, Yinghao
    Qu, Liang
    APPLIED SCIENCES-BASEL, 2024, 14 (17):
  • [8] Video Visual Relation Detection via Iterative Inference
    Shang, Xindi
    Li, Yicong
    Xiao, Junbin
    Ji, Wei
    Chua, Tat-Seng
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3654 - 3663
  • [9] Entity Dependency Learning Network With Relation Prediction for Video Visual Relation Detection
    Zhang, Guoguang
    Tang, Yepeng
    Zhang, Chunjie
    Zheng, Xiaolong
    Zhao, Yao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 12425 - 12436
  • [10] Video Relation Detection via Tracklet based Visual Transformer
    Gao, Kaifeng
    Chen, Long
    Huang, Yifeng
    Xiao, Jun
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4833 - 4837