Video Visual Relation Detection With Contextual Knowledge Embedding

被引:0
|
作者
Cao, Qianwen [1 ,2 ]
Huang, Heyan [1 ,2 ]
机构
[1] Beijing Inst Technol, Sch Comp, Beijing 100081, Peoples R China
[2] Beijing Engn Res Ctr High Volume Language Informat, Beijing 100081, Peoples R China
基金
中国国家自然科学基金;
关键词
Knowledge engineering; Computer vision; knowledge embedding; video understanding; video visual relation detection; visual relation tagging;
D O I
10.1109/TKDE.2023.3270328
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video visual relation detection (VidVRD) aims at abstracting structured relations in the form of $< $<subject-predicate-object$>$> from videos. The triple formation makes the search space extremely huge and the distribution unbalanced. Usually, existing works predict the relationships from visual, spatial, and semantic cues. Among them, semantic cues are responsible for exploring the semantic connections between objects, which is crucial to transfer knowledge across relations. However, most of these works extract semantic cues via simply mapping the object labels to classified features, which ignore the contextual surroundings, resulting in poor performance for low-frequency relations. To alleviate these issues, we propose a novel network, termed Contextual Knowledge Embedded Relation Network (CKERN), to facilitate VidVRD through establishing contextual knowledge embeddings for detected object pairs in relations from two aspects: commonsense attributes and prior linguistic dependencies. Specifically, we take the pair as a query to extract relational facts in the commonsense knowledge base, then encode them to explicitly construct semantic surroundings for relations. In addition, the statistics of object pairs with different predicates distilled from large-scale visual relations are taken into account to represent the linguistic regularity of relations. Extensive experimental results on benchmark datasets demonstrate the effectiveness and robustness of our proposed model.
引用
收藏
页码:13083 / 13095
页数:13
相关论文
共 50 条
  • [41] Detection of Contextual Identity Links in a Knowledge Base
    Raad, Joe
    Pernelle, Nathalie
    Sais, Fatiha
    K-CAP 2017: PROCEEDINGS OF THE KNOWLEDGE CAPTURE CONFERENCE, 2017,
  • [42] Towards Knowledge-Aware Video Captioning via Transitive Visual Relationship Detection
    Wu, Bofeng
    Niu, Guocheng
    Yu, Jun
    Xiao, Xinyan
    Zhang, Jian
    Wu, Hua
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (10) : 6753 - 6765
  • [43] Scalable Video Event Retrieval by Visual State Binary Embedding
    Yu, Litao
    Huang, Zi
    Cao, Jiewei
    Shen, Heng Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2016, 18 (08) : 1590 - 1603
  • [44] Contextual Anomaly Detection Based Video Surveillance System
    Mahmood, Sawsen Abdulhadi
    Abid, Azal Monshed
    Naser, Wedad Abdul Khuder
    2021 11TH IEEE INTERNATIONAL CONFERENCE ON CONTROL SYSTEM, COMPUTING AND ENGINEERING (ICCSCE 2021), 2021, : 120 - 125
  • [45] Video Copy Detection Based On Temporal Contextual Hashing
    Wang, Rong Bo
    Chen, Hao
    Yao, Jin Hang
    Guo, Yu Tiara
    2016 IEEE SECOND INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2016, : 223 - 228
  • [46] Contextual Word Embedding for Biomedical Knowledge Extraction: a Rapid Review and Case Study
    Dinithi Vithanage
    Ping Yu
    Lei Wang
    Chao Deng
    Journal of Healthcare Informatics Research, 2024, 8 : 158 - 179
  • [47] Contextual Word Embedding for Biomedical Knowledge Extraction: a Rapid Review and Case Study
    Vithanage, Dinithi
    Yu, Ping
    Wang, Lei
    Deng, Chao
    JOURNAL OF HEALTHCARE INFORMATICS RESEARCH, 2024, 8 (01) : 158 - 179
  • [48] Contextual embedding and model weighting by fusing domain knowledge on Biomedical Question Answering
    Lu, Yuxuan
    Yan, Jingya
    Qi, Zhixuan
    Ge, Zhongzheng
    Du, Yongping
    13TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, BCB 2022, 2022,
  • [49] Visual Relation of Interest Detection
    Yu, Fan
    Wang, Haonan
    Ren, Tongwei
    Tang, Jinhui
    Wu, Gangshan
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1386 - 1394
  • [50] Group Visual Relation Detection
    Yu, Fan
    Zhang, Beibei
    Ren, Tongwei
    Liu, Jiale
    Wu, Gangshan
    Tang, Jinhui
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 1645 - 1659