Video Visual Relation Detection With Contextual Knowledge Embedding

被引:0
|
作者
Cao, Qianwen [1 ,2 ]
Huang, Heyan [1 ,2 ]
机构
[1] Beijing Inst Technol, Sch Comp, Beijing 100081, Peoples R China
[2] Beijing Engn Res Ctr High Volume Language Informat, Beijing 100081, Peoples R China
基金
中国国家自然科学基金;
关键词
Knowledge engineering; Computer vision; knowledge embedding; video understanding; video visual relation detection; visual relation tagging;
D O I
10.1109/TKDE.2023.3270328
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video visual relation detection (VidVRD) aims at abstracting structured relations in the form of $< $<subject-predicate-object$>$> from videos. The triple formation makes the search space extremely huge and the distribution unbalanced. Usually, existing works predict the relationships from visual, spatial, and semantic cues. Among them, semantic cues are responsible for exploring the semantic connections between objects, which is crucial to transfer knowledge across relations. However, most of these works extract semantic cues via simply mapping the object labels to classified features, which ignore the contextual surroundings, resulting in poor performance for low-frequency relations. To alleviate these issues, we propose a novel network, termed Contextual Knowledge Embedded Relation Network (CKERN), to facilitate VidVRD through establishing contextual knowledge embeddings for detected object pairs in relations from two aspects: commonsense attributes and prior linguistic dependencies. Specifically, we take the pair as a query to extract relational facts in the commonsense knowledge base, then encode them to explicitly construct semantic surroundings for relations. In addition, the statistics of object pairs with different predicates distilled from large-scale visual relations are taken into account to represent the linguistic regularity of relations. Extensive experimental results on benchmark datasets demonstrate the effectiveness and robustness of our proposed model.
引用
收藏
页码:13083 / 13095
页数:13
相关论文
共 50 条
  • [31] VSRN: Visual-Semantic Relation Network for Video Visual Relation Inference
    Cao, Qianwen
    Huang, Heyan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (02) : 768 - 777
  • [32] Exploiting Contextual Knowledge for Hybrid Classification of Visual Objects
    Eiter, Thomas
    Kaminski, Tobias
    LOGICS IN ARTIFICIAL INTELLIGENCE, (JELIA 2016), 2016, 10021 : 223 - 239
  • [33] Systematic Homonym Detection and Replacement Based on Contextual Word Embedding
    Younghoon Lee
    Neural Processing Letters, 2021, 53 : 17 - 36
  • [34] Systematic Homonym Detection and Replacement Based on Contextual Word Embedding
    Lee, Younghoon
    NEURAL PROCESSING LETTERS, 2021, 53 (01) : 17 - 36
  • [35] Exploiting Query Knowledge Embedding and Trilinear Joint Embedding for Visual Question Answering
    Chen, Zheng
    Wen, Yaxin
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT IV, 2023, 14089 : 780 - 791
  • [36] Interventional Video Relation Detection
    Li, Yicong
    Yang, Xun
    Shang, Xindi
    Chua, Tat-Seng
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4091 - 4099
  • [37] Video Visual Relation Detection via 3D Convolutional Neural Network
    Qu, Mingcheng
    Cui, Jianxun
    Su, Tonghua
    Deng, Ganlin
    Shao, Wenkai
    IEEE ACCESS, 2022, 10 : 23748 - 23756
  • [38] A Unified Model for Video Understanding and Knowledge Embedding with Heterogeneous Knowledge Graph Dataset
    Deng, Jiaxin
    Shen, Dong
    Pan, Haojie
    Wu, Xiangyu
    Liu, Ximan
    Meng, Gaofeng
    Yang, Fan
    Gao, Tingting
    Fu, Ruiji
    Wang, Zhongyuan
    PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 95 - 104
  • [39] On Completing Sparse Knowledge Base with Transitive Relation Embedding
    Zhou, Zili
    Liu, Shaowu
    Xu, Guandong
    Zhang, Wu
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 3125 - 3132
  • [40] Knowledge Graph Embedding by Learning to Connect Entity with Relation
    Huang, Zichao
    Li, Bo
    Yin, Jian
    WEB AND BIG DATA (APWEB-WAIM 2018), PT I, 2018, 10987 : 400 - 414