Diffusion Reward: Learning Rewards via Conditional Video Diffusion

被引:0
|
作者
Huang, Tao [1 ,2 ,5 ]
Jiang, Guangqi [1 ,3 ]
Ze, Yanjie [1 ]
Xu, Huazhe [1 ,4 ,5 ]
机构
[1] Shanghai Qi Zhi Inst, Shanghai, Peoples R China
[2] Chinese Univ Hong Kong, Sha Tin, Hong Kong, Peoples R China
[3] Sichuan Univ, Chengdu, Peoples R China
[4] Tsinghua Univ, IIIS, Beijing, Peoples R China
[5] Shanghai AI Lab, Shanghai, Peoples R China
来源
基金
国家重点研发计划;
关键词
Reward Learning from Videos; Robotic Manipulation; Visual Reinforcement Learning;
D O I
10.1007/978-3-031-72946-1_27
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning rewards from expert videos offers an affordable and effective solution to specify the intended behaviors for reinforcement learning (RL) tasks. In this work, we propose Diffusion Reward, a novel framework that learns rewards from expert videos via conditional video diffusion models for solving complex visual RL problems. Our key insight is that lower generative diversity is exhibited when conditioning diffusion on expert trajectories. Diffusion Reward is accordingly formalized by the negative of conditional entropy that encourages productive exploration of expert behaviors. We show the efficacy of our method over robotic manipulation tasks in both simulation platforms and the real world with visual input. Moreover, Diffusion Reward can even solve unseen tasks successfully and effectively, largely surpassing baseline methods. Project page and code: diffusion-reward.github.io.
引用
收藏
页码:478 / 495
页数:18
相关论文
共 50 条
  • [21] Unsupervised Conditional Diffusion Models in Video Anomaly Detection for Monitoring Dust Pollution
    Cai, Limin
    Li, Mofei
    Wang, Dianpeng
    SENSORS, 2024, 24 (05)
  • [22] Generating Realistic Brain MRIs via a Conditional Diffusion Probabilistic Model
    Peng, Wei
    Adeli, Ehsan
    Bosschieter, Tomas
    Park, Sang Hyun
    Zhao, Qingyu
    Pohl, Kilian M.
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VIII, 2023, 14227 : 14 - 24
  • [23] Unsupervised Deep Learning via Affinity Diffusion
    Huang, Jiabo
    Dong, Qi
    Gong, Shaogang
    Zhu, Xiatian
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11029 - 11036
  • [24] Conditional Diffusion for Interactive Segmentation
    Chen, Xi
    Zhao, Zhiyan
    Yu, Feiwu
    Zhang, Yilei
    Duan, Manni
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7325 - 7334
  • [25] Conditional estimation of diffusion processes
    Li, MQ
    Pearson, ND
    Poteshman, AM
    JOURNAL OF FINANCIAL ECONOMICS, 2004, 74 (01) : 31 - 66
  • [26] ON CONDITIONAL DIFFUSION-PROCESSES
    LYONS, TJ
    ZHENG, WA
    PROCEEDINGS OF THE ROYAL SOCIETY OF EDINBURGH SECTION A-MATHEMATICS, 1990, 115 : 243 - 255
  • [27] CONDITIONAL DISTRIBUTIONS OF DIFFUSION PROCESSES
    KRYLOV, NV
    ROZOVSKII, BL
    MATHEMATICS OF THE USSR-IZVESTIYA, 1978, 12 (02): : 336 - 356
  • [28] ShiftDDPMs: Exploring Conditional Diffusion Models by Shifting Diffusion Trajectories
    Zhang, Zijian
    Zhao, Zhou
    Yu, Jun
    Tian, Qi
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3552 - 3560
  • [29] Video Vectorization via Bipartite Diffusion Curves Propagation and Optimization
    Li, Yuanqi
    Wang, Chuan
    Hong, Jing
    Zhu, Jie
    Guo, Jie
    Wang, Jue
    Guo, Yanwen
    Wang, Wenping
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2022, 28 (09) : 3265 - 3276
  • [30] PEEKABOO: Interactive Video Generation via Masked-Diffusion
    Jain, Yash
    Nasery, Anshul
    Vineet, Vibhav
    Behl, Harkirat
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 8079 - 8088