Combining Deep Deterministic Policy Gradient with Cross-Entropy Method

被引：0

作者：

Lai, Tung-Yi ^{[1
]}

Hsueh, Chu-Hsuan ^{[1
,2
]}

Lin, You-Hsuan ^{[1
]}

Chu, Yeong-Jia Roger ^{[1
]}

Hsueh, Bo-Yang ^{[1
]}

Wu, I-Chen ^{[1
,3
]}

机构：

[1] Natl Chiao Tung Univ, Dept Comp Sci, Hsinchu, Taiwan

[2] Japan Adv Inst Sci & Technol, Sch Informat Sci, Nomi, Ishikawa, Japan

[3] Pervas Artificial Intelligence Res PAIR Labs, Hsinchu, Taiwan

来源：

2019 INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI) | 2019年

关键词：

reinforcement learning; robotics; object grasping; deep deterministic policy gradient; cross-entropy method;

D O I：

10.1109/taai48200.2019.8959942

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes a deep reinforcement learning algorithm for solving robotic tasks, such as grasping objects. We propose in this paper a combination of cross-entropy optimization (CE) with deep deterministic policy gradient (DDPG). More specifically, where in the CE method, we first sample from a Gaussian distribution with zero as its initial mean, we now set the initial mean to DDPG's output instead. The resulting algorithm is referred to as the DDPG-CE method. Next, to negate the effects of bad samples, we improve on DDPG-CE by substituting the CE component with a weighted CE method, resulting in the DDPG-WCE algorithm. Experiments show that DDPG-WCE achieves a higher success rate on grasping previously unseen objects, than other approaches, such as supervised learning, DDPG, CE, and DDPG-CE.

引用

页数：5

共 50 条

[31] Dynamic Cross-Entropy
Aur, Dorian
Vila-Rodriguez, Fidel
[J]. JOURNAL OF NEUROSCIENCE METHODS, 2017, 275 : 10 - 18
[32] On the Renyi Cross-Entropy
Thierrin, Ferenc Cole
Alajaji, Fady
Linder, Tamas
[J]. 2022 17TH CANADIAN WORKSHOP ON INFORMATION THEORY (CWIT), 2022, : 1 - 5
[33] AUV Collision Avoidance Planning Method Based on Deep Deterministic Policy Gradient
Yuan, Jianya
Han, Mengxue
Wang, Hongjian
Zhong, Bo
Gao, Wei
Yu, Dan
[J]. JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2023, 11 (12)
[34] Control Method for PEMFC Using Improved Deep Deterministic Policy Gradient Algorithm
Li, Jiawen
Li, Yaping
Yu, Tao
[J]. FRONTIERS IN ENERGY RESEARCH, 2021, 9
[35] Multi-vehicle Flocking Control with Deep Deterministic Policy Gradient Method
Xu, Zhao
Lyu, Yang
Pan, Quan
Hu, Jinwen
Zhao, Chunhui
Liu, Shuai
[J]. 2018 IEEE 14TH INTERNATIONAL CONFERENCE ON CONTROL AND AUTOMATION (ICCA), 2018, : 306 - 311
[36] Cross-entropy clustering
Tabor, J.
Spurek, P.
[J]. PATTERN RECOGNITION, 2014, 47 (09) : 3046 - 3059
[37] Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning
Shi, Wenjie
Song, Shiji
Wu, Cheng
[J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3425 - 3431
[38] Combining Cross-Entropy and MADS Methods for Inequality Constrained Global Optimization
Audet C.
Bigeon J.
Couderc R.
[J]. Operations Research Forum, 2 (3)
[39] Application of the Cross-Entropy Method to Electromagnetic Optimisation Problems
Kovaleva, Maria
Bulger, David
Khokle, Rajas P.
Esselle, Karu P.
[J]. 2018 IEEE ANTENNAS AND PROPAGATION SOCIETY INTERNATIONAL SYMPOSIUM ON ANTENNAS AND PROPAGATION & USNC/URSI NATIONAL RADIO SCIENCE MEETING, 2018, : 1595 - 1596
[40] Sparse Antenna Array Optimization With the Cross-Entropy Method
Minvielle, Pierre
Tantar, Emilia
Tantar, Alexandru-Adrian
Berisset, Philippe
[J]. IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, 2011, 59 (08) : 2862 - 2871

← 1 2 3 4 5 →