Social Fabric: Tubelet Compositions for Video Relation Detection

被引:5
|
作者
Chen, Shuo [1 ]
Shi, Zenglin [1 ]
Mettes, Pascal [1 ]
Snoek, Cees G. M. [1 ]
机构
[1] Univ Amsterdam, Amsterdam, Netherlands
关键词
D O I
10.1109/ICCV48922.2021.01323
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper strives to classify and detect the relationship between object tubelets appearing within a video as a < subject-predicate-object > triplet. Where existing works treat object proposals or tubelets as single entities and model their relations a posteriori, we propose to classify and detect predicates for pairs of object tubelets a priori. We also propose Social Fabric: an encoding that represents a pair of object tubelets as a composition of interaction primitives. These primitives are learned over all relations, resulting in a compact representation able to localize and classify relations from the pool of co-occurring object tubelets across all timespans in a video. The encoding enables our two-stage network. In the first stage, we train Social Fabric to suggest proposals that are likely interacting. We use the Social Fabric in the second stage to simultaneously fine-tune and predict predicate labels for the tubelets. Experiments demonstrate the benefit of early video relation modeling, our encoding and the two-stage architecture, leading to a new state-of-the-art on two benchmarks. We also show how the encoding enables query-by-primitive-example to search for spatio-temporal video relations. Code: https://github.com/shanshuo/Social-Fabric.
引用
收藏
页码:13465 / 13474
页数:10
相关论文
共 50 条
  • [21] Video Relation Detection via Multiple Hypothesis Association
    Su, Zixuan
    Shang, Xindi
    Chen, Jingjing
    Jiang, Yu-Gang
    Qiu, Zhiyong
    Chua, Tat-Seng
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3127 - 3135
  • [22] Video Visual Relation Detection via Iterative Inference
    Shang, Xindi
    Li, Yicong
    Xiao, Junbin
    Ji, Wei
    Chua, Tat-Seng
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3654 - 3663
  • [23] Video Relation Detection with Spatio-Temporal Graph
    Qian, Xufeng
    Zhuang, Yueting
    Li, Yimeng
    Xiao, Shaoning
    Pu, Shiliang
    Xiao, Jun
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 84 - 93
  • [24] Entity Dependency Learning Network With Relation Prediction for Video Visual Relation Detection
    Zhang, Guoguang
    Tang, Yepeng
    Zhang, Chunjie
    Zheng, Xiaolong
    Zhao, Yao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 12425 - 12436
  • [25] HUMAN DETECTION AND SOCIAL DISTANCING MEASUREMENT IN A VIDEO
    Saramas, Kosin
    Supratak, Akara
    Yimwadsana, Boonsit
    Kraisangka, Jidapa
    Noraset, Thanapon
    Kusakunniran, Worapan
    2022 19TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE 2022), 2022,
  • [26] Three-Stream Action Tubelet Detector for Spatiotemporal Action Detection in Videos
    Wu, Yutang
    Wang, Hanli
    Li, Qinyu
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2018, PT II, 2018, 11165 : 296 - 306
  • [27] Spatiotemporal tubelet feature aggregation and object linking for small object detection in videos
    Cores, Daniel
    Brea, Victor M.
    Mucientes, Manuel
    APPLIED INTELLIGENCE, 2023, 53 (01) : 1205 - 1217
  • [28] A Survey of Social Relation Understanding Based on Image and Video Information
    Wang Z.
    Wu B.
    Wang W.-Z.
    Teng Y.-Y.
    Shuai J.
    Xiao Y.-P.
    Bai T.
    Jisuanji Xuebao/Chinese Journal of Computers, 2021, 44 (06): : 1168 - 1199
  • [29] The Social Fabric
    Pugh, Christina
    PLOUGHSHARES, 2016, 42 (04) : 109 - 109
  • [30] Social Fabric
    Sommer, Danielle
    TEXTILE-THE JOURNAL OF CLOTH & CULTURE, 2013, 11 (03): : 329 - 333