Combining Deep Deterministic Policy Gradient with Cross-Entropy Method

被引:0
|
作者
Lai, Tung-Yi [1 ]
Hsueh, Chu-Hsuan [1 ,2 ]
Lin, You-Hsuan [1 ]
Chu, Yeong-Jia Roger [1 ]
Hsueh, Bo-Yang [1 ]
Wu, I-Chen [1 ,3 ]
机构
[1] Natl Chiao Tung Univ, Dept Comp Sci, Hsinchu, Taiwan
[2] Japan Adv Inst Sci & Technol, Sch Informat Sci, Nomi, Ishikawa, Japan
[3] Pervas Artificial Intelligence Res PAIR Labs, Hsinchu, Taiwan
关键词
reinforcement learning; robotics; object grasping; deep deterministic policy gradient; cross-entropy method;
D O I
10.1109/taai48200.2019.8959942
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a deep reinforcement learning algorithm for solving robotic tasks, such as grasping objects. We propose in this paper a combination of cross-entropy optimization (CE) with deep deterministic policy gradient (DDPG). More specifically, where in the CE method, we first sample from a Gaussian distribution with zero as its initial mean, we now set the initial mean to DDPG's output instead. The resulting algorithm is referred to as the DDPG-CE method. Next, to negate the effects of bad samples, we improve on DDPG-CE by substituting the CE component with a weighted CE method, resulting in the DDPG-WCE algorithm. Experiments show that DDPG-WCE achieves a higher success rate on grasping previously unseen objects, than other approaches, such as supervised learning, DDPG, CE, and DDPG-CE.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] The Cross-Entropy Method for Policy Search in Decentralized POMDPs
    Oliehoek, Frans A.
    Kooij, Julian F. P.
    Vlassis, Nikos
    [J]. INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2008, 32 (04): : 341 - 357
  • [2] Combining Soft-Actor Critic with Cross-Entropy Method for Policy Search in Continuous Control
    Hieu Trung Nguyen
    Khang Tran
    Ngoc Hoang Luong
    [J]. 2022 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2022,
  • [3] A tutorial on the cross-entropy method
    De Boer, PT
    Kroese, DP
    Mannor, S
    Rubinstein, RY
    [J]. ANNALS OF OPERATIONS RESEARCH, 2005, 134 (01) : 19 - 67
  • [4] Cross-Entropy Regularized Policy Gradient for Multirobot Nonadversarial Moving Target Search
    Guo, Hongliang
    Liu, Zhaokai
    Shi, Rui
    Yau, Wei-Yun
    Rus, Daniela
    [J]. IEEE TRANSACTIONS ON ROBOTICS, 2023, 39 (04) : 2569 - 2584
  • [5] ON THE PERFORMANCE OF THE CROSS-ENTROPY METHOD
    Hu, Jiaqiao
    Hu, Ping
    [J]. PROCEEDINGS OF THE 2009 WINTER SIMULATION CONFERENCE (WSC 2009 ), VOL 1-4, 2009, : 451 - 460
  • [6] On the Convergence of the Cross-Entropy Method
    L. Margolin
    [J]. Annals of Operations Research, 2005, 134 : 201 - 214
  • [7] The Differentiable Cross-Entropy Method
    Amos, Brandon
    Yarats, Denis
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [8] On the convergence of the cross-entropy method
    Margolin, L
    [J]. ANNALS OF OPERATIONS RESEARCH, 2005, 134 (01) : 201 - 214
  • [9] A Tutorial on the Cross-Entropy Method
    Pieter-Tjerk de Boer
    Dirk P. Kroese
    Shie Mannor
    Reuven Y. Rubinstein
    [J]. Annals of Operations Research, 2005, 134 : 19 - 67
  • [10] Approximating the Gradient of Cross-Entropy Loss Function
    Li, Li
    Doroslovacki, Milos
    Loew, Murray H.
    [J]. IEEE ACCESS, 2020, 8 : 111626 - 111635