Is an Object-Centric Video Representation Beneficial for Transfer?

被引：0

作者：

Zhang, Chuhan ^{[1
]}

Gupta, Ankush ^{[2
]}

Zisserman, Andrew ^{[1
]}

机构：

[1] Univ Oxford, Dept Engn Sci, Visual Geometry Grp, Oxford, England

[2] DeepMind, London, England

来源：

COMPUTER VISION - ACCV 2022, PT IV | 2023年 / 13844卷

基金：

英国工程与自然科学研究理事会;

关键词：

Video action recognition; Object centric representations; Transfer learning;

D O I：

10.1007/978-3-031-26316-3_23

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The objective of this work is to learn an object-centric video representation, with the aim of improving transferability to novel tasks, i.e., tasks different from the pre-training task of action classification. To this end, we introduce a new object-centric video recognition model based on a transformer architecture. The model learns a set of object-centric summary vectors for the video, and uses these vectors to fuse the visual and spatio-temporal trajectory 'modalities' of the video clip. We also introduce a novel trajectory contrast loss to further enhance objectness in these summary vectors. With experiments on four datasets-SomethingSomething-V2, Something-Else, Action Genome and EpicKitchens-we show that the object-centric model outperforms prior video representations (both object-agnostic and object-aware), when: (1) classifying actions on unseen objects and unseen environments; (2) low-shot learning of novel classes; (3) linear probe to other downstream tasks; as well as (4) for standard action classification.

引用

页码：379 / 397

页数：19

共 50 条

[1] OCVOS: OBJECT-CENTRIC REPRESENTATION FOR VIDEO OBJECT SEGMENTATION
Jo, Junho
Wee, Dongyoon
Cho, Nam Ik
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1655 - 1659
[2] Object-Centric Representation Learning for Video Scene Understanding
Zhou, Yi
Zhang, Hui
Park, Seung-In
Yoo, ByungIn
Qi, Xiaojuan
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 8410 - 8423
[3] Object-Centric Representation Learning for Video Question Answering
Long Hoang Dang
Thao Minh Le
Vuong Le
Truyen Tran
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[4] Slot-VPS: Object-centric Representation Learning for Video Panoptic Segmentation
Zhou, Yi
Zhang, Hui
Lee, Hana
Sun, Shuyang
Li, Pingjun
Zhu, Yangguang
Yoo, ByungIn
Qi, Xiaojuan
Han, Jae-Joon
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3083 - 3093
[5] Object-Centric Diffusion for Efficient Video Editing
Kahatapitiya, Kumara
Karjauv, Adil
Abati, Davide
Porikli, Fatih
Asano, Yuki M.
Habibian, Amirhossein
COMPUTER VISION-ECCV 2024, PT LVII, 2025, 15115 : 91 - 108
[6] Object-centric Video Prediction without Annotation
Schmeckpeper, Karl
Georgakis, Georgios
Daniilidis, Kostas
2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 13604 - 13610
[7] Learning Object-Centric Transformation for Video Prediction
Chen, Xiongtao
Wang, Wenmin
Wang, Jinzhuo
Li, Weimian
PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1503 - 1511
[8] Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-centric Representation
Fan, Ke
Lei, Jingshi
Qian, Xuelin
Yu, Miaopeng
Xiao, Tianjun
He, Tong
Zhang, Zheng
Fu, Yanwei
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1272 - 1281
[9] Language-Mediated, Object-Centric Representation Learning
Wang, Ruocheng
Mao, Jiayuan
Gershman, Samuel J.
Wu, Jiajun
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2033 - 2046
[10] Object-Centric Representation Learning from Unlabeled Videos
Gao, Ruohan
Jayaraman, Dinesh
Grauman, Kristen
COMPUTER VISION - ACCV 2016, PT V, 2017, 10115 : 248 - 263

← 1 2 3 4 5 →