Social Fabric: Tubelet Compositions for Video Relation Detection

被引：5

作者：

Chen, Shuo ^{[1
]}

Shi, Zenglin ^{[1
]}

Mettes, Pascal ^{[1
]}

Snoek, Cees G. M. ^{[1
]}

机构：

[1] Univ Amsterdam, Amsterdam, Netherlands

来源：

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年

关键词：

D O I：

10.1109/ICCV48922.2021.01323

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper strives to classify and detect the relationship between object tubelets appearing within a video as a < subject-predicate-object > triplet. Where existing works treat object proposals or tubelets as single entities and model their relations a posteriori, we propose to classify and detect predicates for pairs of object tubelets a priori. We also propose Social Fabric: an encoding that represents a pair of object tubelets as a composition of interaction primitives. These primitives are learned over all relations, resulting in a compact representation able to localize and classify relations from the pool of co-occurring object tubelets across all timespans in a video. The encoding enables our two-stage network. In the first stage, we train Social Fabric to suggest proposals that are likely interacting. We use the Social Fabric in the second stage to simultaneously fine-tune and predict predicate labels for the tubelets. Experiments demonstrate the benefit of early video relation modeling, our encoding and the two-stage architecture, leading to a new state-of-the-art on two benchmarks. We also show how the encoding enables query-by-primitive-example to search for spatio-temporal video relations. Code: https://github.com/shanshuo/Social-Fabric.

引用

页码：13465 / 13474

页数：10

共 50 条

[21] Video Relation Detection via Multiple Hypothesis Association
Su, Zixuan
Shang, Xindi
Chen, Jingjing
Jiang, Yu-Gang
Qiu, Zhiyong
Chua, Tat-Seng
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3127 - 3135
[22] Video Visual Relation Detection via Iterative Inference
Shang, Xindi
Li, Yicong
Xiao, Junbin
Ji, Wei
Chua, Tat-Seng
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3654 - 3663
[23] Video Relation Detection with Spatio-Temporal Graph
Qian, Xufeng
Zhuang, Yueting
Li, Yimeng
Xiao, Shaoning
Pu, Shiliang
Xiao, Jun
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 84 - 93
[24] Entity Dependency Learning Network With Relation Prediction for Video Visual Relation Detection
Zhang, Guoguang
Tang, Yepeng
Zhang, Chunjie
Zheng, Xiaolong
Zhao, Yao
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 12425 - 12436
[25] HUMAN DETECTION AND SOCIAL DISTANCING MEASUREMENT IN A VIDEO
Saramas, Kosin
Supratak, Akara
Yimwadsana, Boonsit
Kraisangka, Jidapa
Noraset, Thanapon
Kusakunniran, Worapan
2022 19TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE 2022), 2022,
[26] Three-Stream Action Tubelet Detector for Spatiotemporal Action Detection in Videos
Wu, Yutang
Wang, Hanli
Li, Qinyu
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2018, PT II, 2018, 11165 : 296 - 306
[27] Spatiotemporal tubelet feature aggregation and object linking for small object detection in videos
Cores, Daniel
Brea, Victor M.
Mucientes, Manuel
APPLIED INTELLIGENCE, 2023, 53 (01) : 1205 - 1217
[28] A Survey of Social Relation Understanding Based on Image and Video Information
Wang Z.
Wu B.
Wang W.-Z.
Teng Y.-Y.
Shuai J.
Xiao Y.-P.
Bai T.
Jisuanji Xuebao/Chinese Journal of Computers, 2021, 44 (06): : 1168 - 1199
[29] The Social Fabric
Pugh, Christina
PLOUGHSHARES, 2016, 42 (04) : 109 - 109
[30] Social Fabric
Sommer, Danielle
TEXTILE-THE JOURNAL OF CLOTH & CULTURE, 2013, 11 (03): : 329 - 333

← 1 2 3 4 5 →