Social Fabric: Tubelet Compositions for Video Relation Detection

被引:5
|
作者
Chen, Shuo [1 ]
Shi, Zenglin [1 ]
Mettes, Pascal [1 ]
Snoek, Cees G. M. [1 ]
机构
[1] Univ Amsterdam, Amsterdam, Netherlands
关键词
D O I
10.1109/ICCV48922.2021.01323
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper strives to classify and detect the relationship between object tubelets appearing within a video as a < subject-predicate-object > triplet. Where existing works treat object proposals or tubelets as single entities and model their relations a posteriori, we propose to classify and detect predicates for pairs of object tubelets a priori. We also propose Social Fabric: an encoding that represents a pair of object tubelets as a composition of interaction primitives. These primitives are learned over all relations, resulting in a compact representation able to localize and classify relations from the pool of co-occurring object tubelets across all timespans in a video. The encoding enables our two-stage network. In the first stage, we train Social Fabric to suggest proposals that are likely interacting. We use the Social Fabric in the second stage to simultaneously fine-tune and predict predicate labels for the tubelets. Experiments demonstrate the benefit of early video relation modeling, our encoding and the two-stage architecture, leading to a new state-of-the-art on two benchmarks. We also show how the encoding enables query-by-primitive-example to search for spatio-temporal video relations. Code: https://github.com/shanshuo/Social-Fabric.
引用
收藏
页码:13465 / 13474
页数:10
相关论文
共 50 条
  • [1] TubeR: Tubelet Transformer for Video Action Detection
    Zhao, Jiaojiao
    Zhang, Yanyi
    Li, Xinyu
    Chen, Hao
    Shuai, Bing
    Xu, Mingze
    Liu, Chunhui
    Kundu, Kaustav
    Xiong, Yuanjun
    Modolo, Davide
    Marsic, Ivan
    Snoek, Cees G. M.
    Tighe, Joseph
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13588 - 13597
  • [2] Detection and tracking based tubelet generation for video object detection
    Wang, Bin
    Tang, Sheng
    Xiao, Jun-Bin
    Yan, Quan-Feng
    Zhang, Yong-Dong
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 58 : 102 - 111
  • [3] ENHANCED ACTION TUBELET DETECTOR FOR SPATIO-TEMPORAL VIDEO ACTION DETECTION
    Wu, Yutang
    Wang, Hanli
    Wang, Shuheng
    Li, Qinyu
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 2388 - 2392
  • [4] Video-based Human-Object Interaction Detection from Tubelet Tokens
    Tu, Danyang
    Sun, Wei
    Min, Xiongkuo
    Zhai, Guangtao
    Shen, Wei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [5] Object Detection in Videos with Tubelet Proposal Networks
    Kang, Kai
    Li, Hongsheng
    Xiao, Tong
    Ouyang, Wanli
    Yan, Junjie
    Liu, Xihui
    Wang, Xiaogang
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 889 - 897
  • [6] Video Visual Relation Detection
    Shang, Xindi
    Ren, Tongwei
    Guo, Jingfan
    Zhang, Hanwang
    Chua, Tat-Seng
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1300 - 1308
  • [7] Interventional Video Relation Detection
    Li, Yicong
    Yang, Xun
    Shang, Xindi
    Chua, Tat-Seng
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4091 - 4099
  • [8] Attention Guided Relation Detection Approach for Video Visual Relation Detection
    Cao, Qianwen
    Huang, Heyan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 3896 - 3907
  • [9] TQRFormer: Tubelet query recollection transformer for action detection
    Wang, Xiangyang
    Yang, Kun
    Ding, Qiang
    Wang, Rui
    Sun, Jinhua
    IMAGE AND VISION COMPUTING, 2024, 147
  • [10] Recurrent Tubelet Proposal and Recognition Networks for Action Detection
    Li, Dong
    Qiu, Zhaofan
    Dai, Qi
    Yao, Ting
    Mei, Tao
    COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 : 306 - 322