Language-Mediated, Object-Centric Representation Learning

被引:0
|
作者
Wang, Ruocheng [1 ]
Mao, Jiayuan [2 ]
Gershman, Samuel J. [3 ]
Wu, Jiajun
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] MIT CSAIL, Cambridge, MA USA
[3] Harvard Univ, Cambridge, MA 02138 USA
关键词
INDIVIDUATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present Language-mediated, Object-centric Representation Learning (LORL), a paradigm for learning disentangled, object-centric scene representations from vision and language. LORL builds upon recent advances in unsupervised object discovery and segmentation, notably MONet and Slot Attention. While these algorithms learn an object-centric representation just by reconstructing the input image, LORL enables them to further learn to associate the learned representations to concepts, i.e., words for object categories, properties, and spatial relationships, from language input. These object-centric concepts derived from language facilitate the learning of object-centric representations. LORL can be integrated with various unsupervised object discovery algorithms that are language-agnostic. Experiments show that the integration of LORL consistently improves the performance of unsupervised object discovery methods on two datasets via the help of language. We also show that concepts learned by LORL, in conjunction with object discovery methods, aid downstream tasks such as referring expression comprehension.
引用
收藏
页码:2033 / 2046
页数:14
相关论文
共 50 条
  • [1] Object-Centric Representation Learning for Video Scene Understanding
    Zhou, Yi
    Zhang, Hui
    Park, Seung-In
    Yoo, ByungIn
    Qi, Xiaojuan
    [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46 (12) : 8410 - 8423
  • [2] Object-Centric Representation Learning from Unlabeled Videos
    Gao, Ruohan
    Jayaraman, Dinesh
    Grauman, Kristen
    [J]. COMPUTER VISION - ACCV 2016, PT V, 2017, 10115 : 248 - 263
  • [3] Object-Centric Representation Learning for Video Question Answering
    Long Hoang Dang
    Thao Minh Le
    Vuong Le
    Truyen Tran
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [4] OCVOS: OBJECT-CENTRIC REPRESENTATION FOR VIDEO OBJECT SEGMENTATION
    Jo, Junho
    Wee, Dongyoon
    Cho, Nam Ik
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1655 - 1659
  • [5] Representation learning from videos in-the-wild: An object-centric approach
    Romijnders, Rob
    Mahendran, Aravindh
    Tschannen, Michael
    Djolonga, Josip
    Ritter, Marvin
    Houlsby, Neil
    Lucic, Mario
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 177 - 187
  • [6] Is an Object-Centric Video Representation Beneficial for Transfer?
    Zhang, Chuhan
    Gupta, Ankush
    Zisserman, Andrew
    [J]. COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 379 - 397
  • [7] Object-Centric Representation Learning with Generative Spatial-Temporal Factorization
    Nanbo, Li
    Raza, Muhammad Ahmed
    Hu Wenbin
    Sun, Zhaole
    Fisher, Robert B.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [8] Multi-Object Representation Learning via Feature Connectivity and Object-Centric Regularization
    Foo, Alex
    Hsu, Wynne
    Lee, Mong Li
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] Provably Learning Object-Centric Representations
    Brady, Jack
    Zimmermann, Roland S.
    Sharma, Yash
    Schoelkopf, Bernhard
    von Kuegelgen, Julius
    Brendel, Wieland
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [10] Object-Centric Learning with Slot Attention
    Locatello, Francesco
    Weissenborn, Dirk
    Unterthiner, Thomas
    Mahendran, Aravindh
    Heigold, Georg
    Uszkoreit, Jakob
    Dosovitskiy, Alexey
    Kipf, Thomas
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33