Combining Deep Deterministic Policy Gradient with Cross-Entropy Method

被引：0

作者：

Lai, Tung-Yi ^{[1
]}

Hsueh, Chu-Hsuan ^{[1
,2
]}

Lin, You-Hsuan ^{[1
]}

Chu, Yeong-Jia Roger ^{[1
]}

Hsueh, Bo-Yang ^{[1
]}

Wu, I-Chen ^{[1
,3
]}

机构：

[1] Natl Chiao Tung Univ, Dept Comp Sci, Hsinchu, Taiwan

[2] Japan Adv Inst Sci & Technol, Sch Informat Sci, Nomi, Ishikawa, Japan

[3] Pervas Artificial Intelligence Res PAIR Labs, Hsinchu, Taiwan

来源：

2019 INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI) | 2019年

关键词：

reinforcement learning; robotics; object grasping; deep deterministic policy gradient; cross-entropy method;

D O I：

10.1109/taai48200.2019.8959942

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes a deep reinforcement learning algorithm for solving robotic tasks, such as grasping objects. We propose in this paper a combination of cross-entropy optimization (CE) with deep deterministic policy gradient (DDPG). More specifically, where in the CE method, we first sample from a Gaussian distribution with zero as its initial mean, we now set the initial mean to DDPG's output instead. The resulting algorithm is referred to as the DDPG-CE method. Next, to negate the effects of bad samples, we improve on DDPG-CE by substituting the CE component with a weighted CE method, resulting in the DDPG-WCE algorithm. Experiments show that DDPG-WCE achieves a higher success rate on grasping previously unseen objects, than other approaches, such as supervised learning, DDPG, CE, and DDPG-CE.

引用

页数：5

共 50 条

[1] The Cross-Entropy Method for Policy Search in Decentralized POMDPs
Oliehoek, Frans A.
Kooij, Julian F. P.
Vlassis, Nikos
[J]. INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2008, 32 (04): : 341 - 357
[2] Combining Soft-Actor Critic with Cross-Entropy Method for Policy Search in Continuous Control
Hieu Trung Nguyen
Khang Tran
Ngoc Hoang Luong
[J]. 2022 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2022,
[3] A tutorial on the cross-entropy method
De Boer, PT
Kroese, DP
Mannor, S
Rubinstein, RY
[J]. ANNALS OF OPERATIONS RESEARCH, 2005, 134 (01) : 19 - 67
[4] Cross-Entropy Regularized Policy Gradient for Multirobot Nonadversarial Moving Target Search
Guo, Hongliang
Liu, Zhaokai
Shi, Rui
Yau, Wei-Yun
Rus, Daniela
[J]. IEEE TRANSACTIONS ON ROBOTICS, 2023, 39 (04) : 2569 - 2584
[5] ON THE PERFORMANCE OF THE CROSS-ENTROPY METHOD
Hu, Jiaqiao
Hu, Ping
[J]. PROCEEDINGS OF THE 2009 WINTER SIMULATION CONFERENCE (WSC 2009 ), VOL 1-4, 2009, : 451 - 460
[6] On the Convergence of the Cross-Entropy Method
L. Margolin
[J]. Annals of Operations Research, 2005, 134 : 201 - 214
[7] The Differentiable Cross-Entropy Method
Amos, Brandon
Yarats, Denis
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[8] On the convergence of the cross-entropy method
Margolin, L
[J]. ANNALS OF OPERATIONS RESEARCH, 2005, 134 (01) : 201 - 214
[9] A Tutorial on the Cross-Entropy Method
Pieter-Tjerk de Boer
Dirk P. Kroese
Shie Mannor
Reuven Y. Rubinstein
[J]. Annals of Operations Research, 2005, 134 : 19 - 67
[10] Approximating the Gradient of Cross-Entropy Loss Function
Li, Li
Doroslovacki, Milos
Loew, Murray H.
[J]. IEEE ACCESS, 2020, 8 : 111626 - 111635

← 1 2 3 4 5 →