Combining Deep Deterministic Policy Gradient with Cross-Entropy Method

被引:0
|
作者
Lai, Tung-Yi [1 ]
Hsueh, Chu-Hsuan [1 ,2 ]
Lin, You-Hsuan [1 ]
Chu, Yeong-Jia Roger [1 ]
Hsueh, Bo-Yang [1 ]
Wu, I-Chen [1 ,3 ]
机构
[1] Natl Chiao Tung Univ, Dept Comp Sci, Hsinchu, Taiwan
[2] Japan Adv Inst Sci & Technol, Sch Informat Sci, Nomi, Ishikawa, Japan
[3] Pervas Artificial Intelligence Res PAIR Labs, Hsinchu, Taiwan
关键词
reinforcement learning; robotics; object grasping; deep deterministic policy gradient; cross-entropy method;
D O I
10.1109/taai48200.2019.8959942
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a deep reinforcement learning algorithm for solving robotic tasks, such as grasping objects. We propose in this paper a combination of cross-entropy optimization (CE) with deep deterministic policy gradient (DDPG). More specifically, where in the CE method, we first sample from a Gaussian distribution with zero as its initial mean, we now set the initial mean to DDPG's output instead. The resulting algorithm is referred to as the DDPG-CE method. Next, to negate the effects of bad samples, we improve on DDPG-CE by substituting the CE component with a weighted CE method, resulting in the DDPG-WCE algorithm. Experiments show that DDPG-WCE achieves a higher success rate on grasping previously unseen objects, than other approaches, such as supervised learning, DDPG, CE, and DDPG-CE.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Dynamic Cross-Entropy
    Aur, Dorian
    Vila-Rodriguez, Fidel
    [J]. JOURNAL OF NEUROSCIENCE METHODS, 2017, 275 : 10 - 18
  • [32] On the Renyi Cross-Entropy
    Thierrin, Ferenc Cole
    Alajaji, Fady
    Linder, Tamas
    [J]. 2022 17TH CANADIAN WORKSHOP ON INFORMATION THEORY (CWIT), 2022, : 1 - 5
  • [33] AUV Collision Avoidance Planning Method Based on Deep Deterministic Policy Gradient
    Yuan, Jianya
    Han, Mengxue
    Wang, Hongjian
    Zhong, Bo
    Gao, Wei
    Yu, Dan
    [J]. JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2023, 11 (12)
  • [34] Control Method for PEMFC Using Improved Deep Deterministic Policy Gradient Algorithm
    Li, Jiawen
    Li, Yaping
    Yu, Tao
    [J]. FRONTIERS IN ENERGY RESEARCH, 2021, 9
  • [35] Multi-vehicle Flocking Control with Deep Deterministic Policy Gradient Method
    Xu, Zhao
    Lyu, Yang
    Pan, Quan
    Hu, Jinwen
    Zhao, Chunhui
    Liu, Shuai
    [J]. 2018 IEEE 14TH INTERNATIONAL CONFERENCE ON CONTROL AND AUTOMATION (ICCA), 2018, : 306 - 311
  • [36] Cross-entropy clustering
    Tabor, J.
    Spurek, P.
    [J]. PATTERN RECOGNITION, 2014, 47 (09) : 3046 - 3059
  • [37] Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning
    Shi, Wenjie
    Song, Shiji
    Wu, Cheng
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3425 - 3431
  • [38] Combining Cross-Entropy and MADS Methods for Inequality Constrained Global Optimization
    Audet C.
    Bigeon J.
    Couderc R.
    [J]. Operations Research Forum, 2 (3)
  • [39] Application of the Cross-Entropy Method to Electromagnetic Optimisation Problems
    Kovaleva, Maria
    Bulger, David
    Khokle, Rajas P.
    Esselle, Karu P.
    [J]. 2018 IEEE ANTENNAS AND PROPAGATION SOCIETY INTERNATIONAL SYMPOSIUM ON ANTENNAS AND PROPAGATION & USNC/URSI NATIONAL RADIO SCIENCE MEETING, 2018, : 1595 - 1596
  • [40] Sparse Antenna Array Optimization With the Cross-Entropy Method
    Minvielle, Pierre
    Tantar, Emilia
    Tantar, Alexandru-Adrian
    Berisset, Philippe
    [J]. IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, 2011, 59 (08) : 2862 - 2871