Self-Supervised Disentangled Representation Learning for Third-Person Imitation Learning

被引:9
|
作者
Shang, Jinghuan [1 ]
Ryoo, Michael S. [1 ]
机构
[1] SUNY Stony Brook, Dept Comp Sci, 100 Nicolls Rd, Stony Brook, NY 11794 USA
基金
美国国家科学基金会;
关键词
Representation learning; imitation learning;
D O I
10.1109/IROS51168.2021.9636363
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Humans learn to imitate by observing others. However, robot imitation learning generally requires expert demonstrations in the first-person view (FPV). Collecting such FPV videos for every robot could be very expensive. Third-person imitation learning (TPIL) is the concept of learning action policies by observing other agents in a third-person view (TPV), similar to what humans do. This ultimately allows utilizing human and robot demonstration videos in TPV from many different data sources, for the policy learning. In this paper, we present a TPIL approach for robot tasks with egomotion. Although many robot tasks with ground/aerial mobility often involve actions with camera egomotion, study on TPIL for such tasks has been limited. Here, FPV and TPV observations are visually very different; FPV shows egomotion while the agent appearance is only observable in TPV. To enable better state learning for TPIL, we propose our disentangled representation learning method. We use a dual auto-encoder structure plus representation permutation loss and time-contrastive loss to ensure the state and viewpoint representations are well disentangled. Our experiments show the effectiveness of our approach.
引用
收藏
页码:214 / 221
页数:8
相关论文
共 50 条
  • [1] Self-Supervised Learning Disentangled Group Representation as Feature
    Wang, Tan
    Yue, Zhongqi
    Huang, Jianqiang
    Sun, Qianru
    Zhang, Hanwang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [2] Learning disentangled representation for self-supervised video object segmentation
    Hou, Wenjie
    Qin, Zheyun
    Xi, Xiaoming
    Lu, Xiankai
    Yin, Yilong
    NEUROCOMPUTING, 2022, 481 : 270 - 280
  • [3] Learning disentangled representation for self-supervised video object segmentation
    Hou, Wenjie
    Qin, Zheyun
    Xi, Xiaoming
    Lu, Xiankai
    Yin, Yilong
    Neurocomputing, 2022, 481 : 270 - 280
  • [4] Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction
    Mu, Zhaoxi
    Yang, Xinyu
    Sun, Sining
    Yang, Qing
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 18815 - 18823
  • [5] Pose-disentangled Contrastive Learning for Self-supervised Facial Representation
    Liu, Yuanyuan
    Wang, Wenbin
    Zhan, Yibing
    Feng, Shaoze
    Liu, Kejun
    Chen, Zhe
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 9717 - 9728
  • [6] Humanlike Behavior in a Third-Person Shooter with Imitation Learning
    Farhang, Alexander R.
    Mulcahy, Brendan
    Holden, Daniel
    Matthews, Iain
    Yue, Yisong
    2024 IEEE CONFERENCE ON GAMES, COG 2024, 2024,
  • [7] Self-Supervised Adversarial Imitation Learning
    Monteiro, Juarez
    Gavenski, Nathan
    Meneguzzi, Felipe
    Barros, Rodrigo C.
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [8] Self-supervised disentangled representation learning with distribution alignment for multi-view clustering
    Shu, Zhenqiu
    Sun, Teng
    Yu, Zhengtao
    DIGITAL SIGNAL PROCESSING, 2025, 161
  • [9] Whitening for Self-Supervised Representation Learning
    Ermolov, Aleksandr
    Siarohin, Aliaksandr
    Sangineto, Enver
    Sebe, Nicu
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [10] Self-Supervised Representation Learning for CAD
    Jones, Benjamin T.
    Hu, Michael
    Kodnongbua, Milin
    Kim, Vladimir G.
    Schulz, Adriana
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 21327 - 21336