Diffusion Reward: Learning Rewards via Conditional Video Diffusion

被引:0
|
作者
Huang, Tao [1 ,2 ,5 ]
Jiang, Guangqi [1 ,3 ]
Ze, Yanjie [1 ]
Xu, Huazhe [1 ,4 ,5 ]
机构
[1] Shanghai Qi Zhi Inst, Shanghai, Peoples R China
[2] Chinese Univ Hong Kong, Sha Tin, Hong Kong, Peoples R China
[3] Sichuan Univ, Chengdu, Peoples R China
[4] Tsinghua Univ, IIIS, Beijing, Peoples R China
[5] Shanghai AI Lab, Shanghai, Peoples R China
来源
基金
国家重点研发计划;
关键词
Reward Learning from Videos; Robotic Manipulation; Visual Reinforcement Learning;
D O I
10.1007/978-3-031-72946-1_27
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning rewards from expert videos offers an affordable and effective solution to specify the intended behaviors for reinforcement learning (RL) tasks. In this work, we propose Diffusion Reward, a novel framework that learns rewards from expert videos via conditional video diffusion models for solving complex visual RL problems. Our key insight is that lower generative diversity is exhibited when conditioning diffusion on expert trajectories. Diffusion Reward is accordingly formalized by the negative of conditional entropy that encourages productive exploration of expert behaviors. We show the efficacy of our method over robotic manipulation tasks in both simulation platforms and the real world with visual input. Moreover, Diffusion Reward can even solve unseen tasks successfully and effectively, largely surpassing baseline methods. Project page and code: diffusion-reward.github.io.
引用
收藏
页码:478 / 495
页数:18
相关论文
共 50 条
  • [41] Learning Over Social Networks via Diffusion Adaptation
    Zhao, Xiaochuan
    Sayed, Ali H.
    2012 CONFERENCE RECORD OF THE FORTY SIXTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS (ASILOMAR), 2012, : 709 - 713
  • [42] Robust Policy Learning via Offline Skill Diffusion
    Kim, Woo Kyung
    Yoo, Minjong
    Woo, Honguk
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 13177 - 13184
  • [43] Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding
    Kim, Gyeongman
    Shim, Hajin
    Kim, Hyunsu
    Choi, Yunjey
    Kim, Junho
    Yang, Eunho
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6091 - 6100
  • [44] Learning to Generate Urban Design Images From the Conditional Latent Diffusion Model
    Cui, Xiaotang
    Feng, Xiao
    Sun, Siwen
    IEEE ACCESS, 2024, 12 : 89135 - 89143
  • [45] Conditional Granger causality of diffusion processes
    Benjamin Wahl
    Ulrike Feudel
    Jaroslav Hlinka
    Matthias Wächter
    Joachim Peinke
    Jan A. Freund
    The European Physical Journal B, 2017, 90
  • [46] CONDITIONAL DIFFUSION PROCESSES IN POPULATION GENETICS
    EWENS, WJ
    THEORETICAL POPULATION BIOLOGY, 1973, 4 (01) : 21 - 30
  • [47] CONDITIONAL EXPECTATION FORMULA FOR DIFFUSION PROCESSES
    AASE, KK
    JOURNAL OF APPLIED PROBABILITY, 1977, 14 (03) : 626 - 629
  • [48] Conditional Granger causality of diffusion processes
    Wahl, Benjamin
    Feudel, Ulrike
    Hlinka, Jaroslav
    Waechter, Matthias
    Peinke, Joachim
    Freund, Jan A.
    EUROPEAN PHYSICAL JOURNAL B, 2017, 90 (10):
  • [49] Conditional Diffusion Process for Inverse Halftoning
    Jiang, Hao
    Mu, Yadong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [50] Conditional diffusion model for recommender systems
    Chen, Ruixin
    Fan, Jianping
    Wu, Meiqin
    Ma, Sining
    NEURAL NETWORKS, 2025, 185