Visuomotor Control in Multi-Object Scenes Using Object-Aware Representations

被引:1
|
作者
Heravi, Negin [1 ,2 ]
Wahid, Ayzaan [3 ]
Lynch, Corey [3 ]
Florence, Pete [3 ]
Armstrong, Travis [3 ]
Tompson, Jonathan [3 ]
Sermanet, Pierre [3 ]
Bohg, Jeannette [2 ]
Dwibedi, Debidatta [3 ]
机构
[1] Google, Mountain View, CA 94043 USA
[2] Stanford Univ, Stanford, CA 94305 USA
[3] Google, Robot, Mountain View, CA 94043 USA
关键词
D O I
10.1109/ICRA48891.2023.10160888
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Perceptual understanding of the scene and the relationship between its different components is important for successful completion of robotic tasks. Representation learning has been shown to be a powerful technique for this, but most of the current methodologies learn task specific representations that do not necessarily transfer well to other tasks. Furthermore, representations learned by supervised methods require large, labeled datasets for each task that are expensive to collect in the real-world. Using self-supervised learning to obtain representations from unlabeled data can mitigate this problem. However, current self-supervised representation learning methods are mostly object agnostic, and we demonstrate that the resulting representations are insufficient for general purpose robotics tasks as they fail to capture the complexity of scenes with many components. In this paper, we show the effectiveness of using object-aware representation learning techniques for robotic tasks. Our self-supervised representations are learned by observing the agent freely interacting with different parts of the environment and are queried in two different settings: (i) policy learning and (ii) object location prediction. We show that our model learns control policies in a sample-efficient manner and outperforms state-of-the-art object agnostic techniques as well as methods trained on raw RGB images. Our results show a 20% increase in performance in low data regimes (1000 trajectories) in policy training using implicit behavioral cloning (IBC). Furthermore, our method outperforms the baselines for the task of object localization in multi-object scenes. Further qualitative results are available at https://sites.google.com/view/slots4robots.
引用
收藏
页码:9515 / 9522
页数:8
相关论文
共 50 条
  • [31] OBJECT-AWARE SELF-SUPERVISED MULTI-LABEL LEARNING
    Xu Kaixin
    Liu Liyang
    Zhao Ziyuan
    Zeng, Zeng
    Veeravalli, Bharadwaj
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 361 - 365
  • [32] Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
    Wang, Luting
    Liu, Yi
    Du, Penghui
    Ding, Zihan
    Liao, Yue
    Qi, Qiaosong
    Chen, Biaolong
    Liu, Si
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11186 - 11196
  • [33] Multi-Object Sketch Segmentation Using Convolutional Object Detectors
    Moetesum, Momina
    Zeeshan, Osama
    Siddiqi, Imran
    TENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2018), 2019, 11069
  • [34] Multi-Object Search using Object-Oriented POMDPs
    Wandzel, Arthur
    Oh, Yoonseon
    Fishman, Michael
    Kumar, Nishanth
    Wong, Lawson L. S.
    Tellex, Stefanie
    2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 7194 - 7200
  • [35] SiamCross: Siamese Cross Object-Aware Networks for Visual Object Tracking
    Huang W.-H.
    Feng Y.
    Qiang B.-H.
    Pei Y.-X.
    Luo Y.
    Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (10): : 2151 - 2166
  • [36] Multi-object behaviour recognition based on object detection cascaded image classification in classroom scenes
    Dang, Min
    Liu, Gang
    Li, Hao
    Xu, Qijie
    Wang, Xu
    Pan, Rong
    APPLIED INTELLIGENCE, 2024, 54 (06) : 4935 - 4951
  • [37] Target-Aware Object Discovery and Association for Unsupervised Video Multi-Object Segmentation
    Zhou, Tianfei
    Li, Jianwu
    Li, Xueyi
    Shao, Ling
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 6981 - 6990
  • [38] Object-Aware Guidance for Autonomous Scene Reconstruction
    Liu, Ligang
    Xia, Xi
    Sun, Han
    Shen, Qi
    Xu, Juzhan
    Chen, Bin
    Huang, Hui
    Xu, Kai
    ACM TRANSACTIONS ON GRAPHICS, 2018, 37 (04):
  • [39] Object-aware semantics of attention for image captioning
    Wang, Shiwei
    Lan, Long
    Zhang, Xiang
    Dong, Guohua
    Luo, Zhigang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (3-4) : 2013 - 2030
  • [40] Object-Aware Dictionary Learning with Deep Features
    Xie, Yurui
    Porikli, Fatih
    He, Xuming
    COMPUTER VISION - ACCV 2016, PT II, 2017, 10112 : 237 - 253