Is an Object-Centric Video Representation Beneficial for Transfer?

被引:0
|
作者
Zhang, Chuhan [1 ]
Gupta, Ankush [2 ]
Zisserman, Andrew [1 ]
机构
[1] Univ Oxford, Dept Engn Sci, Visual Geometry Grp, Oxford, England
[2] DeepMind, London, England
来源
基金
英国工程与自然科学研究理事会;
关键词
Video action recognition; Object centric representations; Transfer learning;
D O I
10.1007/978-3-031-26316-3_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The objective of this work is to learn an object-centric video representation, with the aim of improving transferability to novel tasks, i.e., tasks different from the pre-training task of action classification. To this end, we introduce a new object-centric video recognition model based on a transformer architecture. The model learns a set of object-centric summary vectors for the video, and uses these vectors to fuse the visual and spatio-temporal trajectory 'modalities' of the video clip. We also introduce a novel trajectory contrast loss to further enhance objectness in these summary vectors. With experiments on four datasets-SomethingSomething-V2, Something-Else, Action Genome and EpicKitchens-we show that the object-centric model outperforms prior video representations (both object-agnostic and object-aware), when: (1) classifying actions on unseen objects and unseen environments; (2) low-shot learning of novel classes; (3) linear probe to other downstream tasks; as well as (4) for standard action classification.
引用
收藏
页码:379 / 397
页数:19
相关论文
共 50 条
  • [1] OCVOS: OBJECT-CENTRIC REPRESENTATION FOR VIDEO OBJECT SEGMENTATION
    Jo, Junho
    Wee, Dongyoon
    Cho, Nam Ik
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1655 - 1659
  • [2] Object-Centric Representation Learning for Video Scene Understanding
    Zhou, Yi
    Zhang, Hui
    Park, Seung-In
    Yoo, ByungIn
    Qi, Xiaojuan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 8410 - 8423
  • [3] Object-Centric Representation Learning for Video Question Answering
    Long Hoang Dang
    Thao Minh Le
    Vuong Le
    Truyen Tran
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [4] Slot-VPS: Object-centric Representation Learning for Video Panoptic Segmentation
    Zhou, Yi
    Zhang, Hui
    Lee, Hana
    Sun, Shuyang
    Li, Pingjun
    Zhu, Yangguang
    Yoo, ByungIn
    Qi, Xiaojuan
    Han, Jae-Joon
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3083 - 3093
  • [5] Object-Centric Diffusion for Efficient Video Editing
    Kahatapitiya, Kumara
    Karjauv, Adil
    Abati, Davide
    Porikli, Fatih
    Asano, Yuki M.
    Habibian, Amirhossein
    COMPUTER VISION-ECCV 2024, PT LVII, 2025, 15115 : 91 - 108
  • [6] Object-centric Video Prediction without Annotation
    Schmeckpeper, Karl
    Georgakis, Georgios
    Daniilidis, Kostas
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 13604 - 13610
  • [7] Learning Object-Centric Transformation for Video Prediction
    Chen, Xiongtao
    Wang, Wenmin
    Wang, Jinzhuo
    Li, Weimian
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1503 - 1511
  • [8] Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-centric Representation
    Fan, Ke
    Lei, Jingshi
    Qian, Xuelin
    Yu, Miaopeng
    Xiao, Tianjun
    He, Tong
    Zhang, Zheng
    Fu, Yanwei
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1272 - 1281
  • [9] Language-Mediated, Object-Centric Representation Learning
    Wang, Ruocheng
    Mao, Jiayuan
    Gershman, Samuel J.
    Wu, Jiajun
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2033 - 2046
  • [10] Object-Centric Representation Learning from Unlabeled Videos
    Gao, Ruohan
    Jayaraman, Dinesh
    Grauman, Kristen
    COMPUTER VISION - ACCV 2016, PT V, 2017, 10115 : 248 - 263