Diffusion Reward: Learning Rewards via Conditional Video Diffusion

被引：0

作者：

Huang, Tao ^{[1
,2
,5
]}

Jiang, Guangqi ^{[1
,3
]}

Ze, Yanjie ^{[1
]}

Xu, Huazhe ^{[1
,4
,5
]}

机构：

[1] Shanghai Qi Zhi Inst, Shanghai, Peoples R China

[2] Chinese Univ Hong Kong, Sha Tin, Hong Kong, Peoples R China

[3] Sichuan Univ, Chengdu, Peoples R China

[4] Tsinghua Univ, IIIS, Beijing, Peoples R China

[5] Shanghai AI Lab, Shanghai, Peoples R China

来源：

COMPUTER VISION - ECCV 2024, PT XLII | 2025年 / 15100卷

基金：

国家重点研发计划;

关键词：

Reward Learning from Videos; Robotic Manipulation; Visual Reinforcement Learning;

D O I：

10.1007/978-3-031-72946-1_27

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning rewards from expert videos offers an affordable and effective solution to specify the intended behaviors for reinforcement learning (RL) tasks. In this work, we propose Diffusion Reward, a novel framework that learns rewards from expert videos via conditional video diffusion models for solving complex visual RL problems. Our key insight is that lower generative diversity is exhibited when conditioning diffusion on expert trajectories. Diffusion Reward is accordingly formalized by the negative of conditional entropy that encourages productive exploration of expert behaviors. We show the efficacy of our method over robotic manipulation tasks in both simulation platforms and the real world with visual input. Moreover, Diffusion Reward can even solve unseen tasks successfully and effectively, largely surpassing baseline methods. Project page and code: diffusion-reward.github.io.

引用

页码：478 / 495

页数：18

共 50 条

[21] Unsupervised Conditional Diffusion Models in Video Anomaly Detection for Monitoring Dust Pollution
Cai, Limin
Li, Mofei
Wang, Dianpeng
SENSORS, 2024, 24 (05)
[22] Generating Realistic Brain MRIs via a Conditional Diffusion Probabilistic Model
Peng, Wei
Adeli, Ehsan
Bosschieter, Tomas
Park, Sang Hyun
Zhao, Qingyu
Pohl, Kilian M.
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VIII, 2023, 14227 : 14 - 24
[23] Unsupervised Deep Learning via Affinity Diffusion
Huang, Jiabo
Dong, Qi
Gong, Shaogang
Zhu, Xiatian
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11029 - 11036
[24] Conditional Diffusion for Interactive Segmentation
Chen, Xi
Zhao, Zhiyan
Yu, Feiwu
Zhang, Yilei
Duan, Manni
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7325 - 7334
[25] Conditional estimation of diffusion processes
Li, MQ
Pearson, ND
Poteshman, AM
JOURNAL OF FINANCIAL ECONOMICS, 2004, 74 (01) : 31 - 66
[26] ON CONDITIONAL DIFFUSION-PROCESSES
LYONS, TJ
ZHENG, WA
PROCEEDINGS OF THE ROYAL SOCIETY OF EDINBURGH SECTION A-MATHEMATICS, 1990, 115 : 243 - 255
[27] CONDITIONAL DISTRIBUTIONS OF DIFFUSION PROCESSES
KRYLOV, NV
ROZOVSKII, BL
MATHEMATICS OF THE USSR-IZVESTIYA, 1978, 12 (02): : 336 - 356
[28] ShiftDDPMs: Exploring Conditional Diffusion Models by Shifting Diffusion Trajectories
Zhang, Zijian
Zhao, Zhou
Yu, Jun
Tian, Qi
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3552 - 3560
[29] Video Vectorization via Bipartite Diffusion Curves Propagation and Optimization
Li, Yuanqi
Wang, Chuan
Hong, Jing
Zhu, Jie
Guo, Jie
Wang, Jue
Guo, Yanwen
Wang, Wenping
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2022, 28 (09) : 3265 - 3276
[30] PEEKABOO: Interactive Video Generation via Masked-Diffusion
Jain, Yash
Nasery, Anshul
Vineet, Vibhav
Behl, Harkirat
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 8079 - 8088

← 1 2 3 4 5 →