Diffusion Reward: Learning Rewards via Conditional Video Diffusion

被引:0
|
作者
Huang, Tao [1 ,2 ,5 ]
Jiang, Guangqi [1 ,3 ]
Ze, Yanjie [1 ]
Xu, Huazhe [1 ,4 ,5 ]
机构
[1] Shanghai Qi Zhi Inst, Shanghai, Peoples R China
[2] Chinese Univ Hong Kong, Sha Tin, Hong Kong, Peoples R China
[3] Sichuan Univ, Chengdu, Peoples R China
[4] Tsinghua Univ, IIIS, Beijing, Peoples R China
[5] Shanghai AI Lab, Shanghai, Peoples R China
来源
基金
国家重点研发计划;
关键词
Reward Learning from Videos; Robotic Manipulation; Visual Reinforcement Learning;
D O I
10.1007/978-3-031-72946-1_27
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning rewards from expert videos offers an affordable and effective solution to specify the intended behaviors for reinforcement learning (RL) tasks. In this work, we propose Diffusion Reward, a novel framework that learns rewards from expert videos via conditional video diffusion models for solving complex visual RL problems. Our key insight is that lower generative diversity is exhibited when conditioning diffusion on expert trajectories. Diffusion Reward is accordingly formalized by the negative of conditional entropy that encourages productive exploration of expert behaviors. We show the efficacy of our method over robotic manipulation tasks in both simulation platforms and the real world with visual input. Moreover, Diffusion Reward can even solve unseen tasks successfully and effectively, largely surpassing baseline methods. Project page and code: diffusion-reward.github.io.
引用
收藏
页码:478 / 495
页数:18
相关论文
共 50 条
  • [31] Video Diffusion Models
    Ho, Jonathan
    Salimans, Tim
    Gritsenko, Alexey
    Chan, William
    Norouzi, Mohammad
    Fleet, David J.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [32] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
    Ruan, Ludan
    Ma, Yiyang
    Yang, Huan
    He, Huiguo
    Liu, Bei
    Fu, Jianlong
    Yuan, Nicholas Jing
    Jin, Qin
    Guo, Baining
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10219 - 10228
  • [33] Drift diffusion model of reward and punishment learning in schizophrenia: Modeling and experimental data
    Moustafa, Ahmed A.
    Keri, Szabolcs
    Somlai, Zsuzsanna
    Balsdon, Tarryn
    Frydecka, Dorota
    Misiak, Blazej
    White, Corey
    BEHAVIOURAL BRAIN RESEARCH, 2015, 291 : 147 - 154
  • [34] Computational Modeling of Reward Learning in Schizophrenia Using the Reinforcement Learning Drift Diffusion Model (RLDDM)
    Pine, Jacob G.
    Pedersen, Mads L.
    Frank, Michael J.
    Barch, Deanna M.
    BIOLOGICAL PSYCHIATRY, 2022, 91 (09) : S302 - S303
  • [35] DuDGAN: Improving Class-Conditional GANs via Dual-Diffusion
    Yeom, Taesun
    Gu, Chanhoe
    Lee, Minhyeok
    IEEE ACCESS, 2024, 12 : 39651 - 39661
  • [36] Zero-Shot Blind Face Restoration Via Conditional Diffusion Sampling
    Li, Haowei
    Zhang, Dongyu
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT II, 2025, 15032 : 448 - 461
  • [37] DIFBFSR: BLIND FACE SUPER-RESOLUTION VIA CONDITIONAL DIFFUSION CONTRACTION
    Yu, Wei
    Li, Zonglin
    Liu, Qinglin
    Chen, Yufan
    Zhang, Shengping
    Lin, Jingbo
    COMPUTING AND INFORMATICS, 2024, 43 (02) : 369 - 392
  • [38] Effectively detecting anomalous diffusion via deep learning
    Pacheco-Pozo, Adrian
    Krapf, Diego
    NATURE COMPUTATIONAL SCIENCE, 2024, 4 (10): : 731 - 732
  • [39] Affinity learning via a diffusion process for subspace clustering
    Li, Qilin
    Liu, Wanquan
    Li, Ling
    PATTERN RECOGNITION, 2018, 84 : 39 - 50
  • [40] Learning Structural Node Embeddings via Diffusion Wavelets
    Donnat, Claire
    Zitnik, Marinka
    Hallac, David
    Leskovec, Jure
    KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 1320 - 1329