Video Visual Relation Detection With Contextual Knowledge Embedding

被引:0
|
作者
Cao, Qianwen [1 ,2 ]
Huang, Heyan [1 ,2 ]
机构
[1] Beijing Inst Technol, Sch Comp, Beijing 100081, Peoples R China
[2] Beijing Engn Res Ctr High Volume Language Informat, Beijing 100081, Peoples R China
基金
中国国家自然科学基金;
关键词
Knowledge engineering; Computer vision; knowledge embedding; video understanding; video visual relation detection; visual relation tagging;
D O I
10.1109/TKDE.2023.3270328
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video visual relation detection (VidVRD) aims at abstracting structured relations in the form of $< $<subject-predicate-object$>$> from videos. The triple formation makes the search space extremely huge and the distribution unbalanced. Usually, existing works predict the relationships from visual, spatial, and semantic cues. Among them, semantic cues are responsible for exploring the semantic connections between objects, which is crucial to transfer knowledge across relations. However, most of these works extract semantic cues via simply mapping the object labels to classified features, which ignore the contextual surroundings, resulting in poor performance for low-frequency relations. To alleviate these issues, we propose a novel network, termed Contextual Knowledge Embedded Relation Network (CKERN), to facilitate VidVRD through establishing contextual knowledge embeddings for detected object pairs in relations from two aspects: commonsense attributes and prior linguistic dependencies. Specifically, we take the pair as a query to extract relational facts in the commonsense knowledge base, then encode them to explicitly construct semantic surroundings for relations. In addition, the statistics of object pairs with different predicates distilled from large-scale visual relations are taken into account to represent the linguistic regularity of relations. Extensive experimental results on benchmark datasets demonstrate the effectiveness and robustness of our proposed model.
引用
收藏
页码:13083 / 13095
页数:13
相关论文
共 50 条
  • [11] Contextual relation embedding and interpretable triplet capsule for inductive relation prediction
    Wu, Jianfeng
    Mai, Sijie
    Hu, Haifeng
    NEUROCOMPUTING, 2022, 505 : 80 - 91
  • [12] Contextual Path Retrieval: A Contextual Entity Relation Embedding-based Approach
    Lo, Pei-Chi
    Lim, Ee-Peng
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2023, 41 (01)
  • [13] A conversational ambiguity resolution model for contextual knowledge embedding
    Li, Kai
    Liu, Jietao
    Yang, Yanshan
    NEUROCOMPUTING, 2025, 625
  • [14] Relation path embedding in knowledge graphs
    Lin, Xixun
    Liang, Yanchun
    Giunchiglia, Fausto
    Feng, Xiaoyue
    Guan, Renchu
    NEURAL COMPUTING & APPLICATIONS, 2019, 31 (09): : 5629 - 5639
  • [15] Web video classification with visual and contextual semantics
    Afzal, Mehtab
    Shah, Nadir
    Muhammad, Tufail
    INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, 2019, 32 (13)
  • [16] Relation path embedding in knowledge graphs
    Xixun Lin
    Yanchun Liang
    Fausto Giunchiglia
    Xiaoyue Feng
    Renchu Guan
    Neural Computing and Applications, 2019, 31 : 5629 - 5639
  • [17] VRDFormer: End-to-End Video Visual Relation Detection with Transformers
    Zheng, Sipeng
    Chen, Shizhe
    Jin, Qin
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18814 - 18824
  • [18] Visual versus Textual Embedding for Video Retrieval
    Francis, Danny
    Pidou, Paul
    Merialdo, Bernard
    Huet, Benoit
    ADVANCED CONCEPTS FOR INTELLIGENT VISION SYSTEMS (ACIVS 2017), 2017, 10617 : 386 - 395
  • [19] Illation of Video Visual Relation Detection Based on Graph Neural Network
    Qu, MingCheng
    Cui, JianXun
    Nie, Yuxi
    Su, TongHua
    IEEE ACCESS, 2021, 9 : 141144 - 141153
  • [20] Visual Relationship Detection with Contextual Information
    Li, Yugang
    Wang, Yongbin
    Chen, Zhe
    Zhu, Yuting
    CMC-COMPUTERS MATERIALS & CONTINUA, 2020, 63 (03): : 1575 - 1589