Diffusion Reward: Learning Rewards via Conditional Video Diffusion

被引:0
|
作者
Huang, Tao [1 ,2 ,5 ]
Jiang, Guangqi [1 ,3 ]
Ze, Yanjie [1 ]
Xu, Huazhe [1 ,4 ,5 ]
机构
[1] Shanghai Qi Zhi Inst, Shanghai, Peoples R China
[2] Chinese Univ Hong Kong, Sha Tin, Hong Kong, Peoples R China
[3] Sichuan Univ, Chengdu, Peoples R China
[4] Tsinghua Univ, IIIS, Beijing, Peoples R China
[5] Shanghai AI Lab, Shanghai, Peoples R China
来源
基金
国家重点研发计划;
关键词
Reward Learning from Videos; Robotic Manipulation; Visual Reinforcement Learning;
D O I
10.1007/978-3-031-72946-1_27
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning rewards from expert videos offers an affordable and effective solution to specify the intended behaviors for reinforcement learning (RL) tasks. In this work, we propose Diffusion Reward, a novel framework that learns rewards from expert videos via conditional video diffusion models for solving complex visual RL problems. Our key insight is that lower generative diversity is exhibited when conditioning diffusion on expert trajectories. Diffusion Reward is accordingly formalized by the negative of conditional entropy that encourages productive exploration of expert behaviors. We show the efficacy of our method over robotic manipulation tasks in both simulation platforms and the real world with visual input. Moreover, Diffusion Reward can even solve unseen tasks successfully and effectively, largely surpassing baseline methods. Project page and code: diffusion-reward.github.io.
引用
收藏
页码:478 / 495
页数:18
相关论文
共 50 条
  • [1] Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement
    Yuan, Hui
    Huang, Kaixuan
    Ni, Chengzhuo
    Chen, Minshuo
    Wang, Mengdi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [2] Spatiotemporal Fusion via Conditional Diffusion Model
    Ma, Yaobin
    Wang, Qi
    Wei, Jingbo
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [3] Conditional generative diffusion deep learning for accelerated diffusion tensor and kurtosis imaging
    Martin, Phillip
    Altbach, Maria
    Bilgin, Ali
    MAGNETIC RESONANCE IMAGING, 2025, 117
  • [4] MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation
    Voleti, Vikram
    Jolicoeur-Martineau, Alexia
    Pal, Christopher
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [5] Generative adversarial defense via conditional diffusion model
    Shi, Xiaowen
    Zhou, Chao
    Wang, Yuan-Gen
    MULTIMEDIA SYSTEMS, 2025, 31 (01)
  • [6] Conditional persistence in logistic models via nonlinear diffusion
    Cantrell, RS
    Cosner, C
    PROCEEDINGS OF THE ROYAL SOCIETY OF EDINBURGH SECTION A-MATHEMATICS, 2002, 132 : 267 - 281
  • [7] Evading DeepFake Detectors via Conditional Diffusion Models
    Wang, Wenhao
    Huang, Fangjun
    PROCEEDINGS OF THE 2024 ACM WORKSHOP ON INFORMATION HIDING AND MULTIMEDIA SECURITY, IH&MMSEC 2024, 2024, : 159 - 164
  • [8] Incentivize Diffusion with Fair Rewards
    Zhang, Wen
    Zhao, Dengji
    Zhang, Yao
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 251 - 258
  • [9] Diffusion policy: Visuomotor policy learning via action diffusion
    Chi, Cheng
    Xu, Zhenjia
    Feng, Siyuan
    Cousineau, Eric
    Du, Yilun
    Burchfiel, Benjamin
    Tedrake, Russ
    Song, Shuran
    INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2024,
  • [10] Video Editing via Factorized Diffusion Distillation
    Singer, Uriel
    Zohar, Amit
    Kirstain, Yuval
    Sheynin, Shelly
    Polyak, Adam
    Parikh, Devi
    Taigman, Yaniv
    COMPUTER VISION - ECCV 2024, PT LXXVI, 2025, 15134 : 450 - 466