Towards Hardware Accelerated Reinforcement Learning for Application-Specific Robotic Control

被引：0

作者：

Shao, Shengjia ^{[1
]}

Tsai, Jason ^{[1
]}

Mysior, Michal ^{[1
]}

Luk, Wayne ^{[1
]}

Chau, Thomas ^{[2
]}

Warren, Alexander ^{[2
]}

Jeppesen, Ben ^{[2
]}

机构：

[1] Imperial Coll London, London, England

[2] Intel Corp, Swindon, Wilts, England

来源：

2018 IEEE 29TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP) | 2018年

基金：

欧盟地平线“2020”; 英国工程与自然科学研究理事会;

关键词：

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reinforcement Learning (RL) is an area of machine learning in which an agent interacts with the environment by making sequential decisions. The agent receives reward from the environment based on how good the decisions are and tries to find an optimal decision-making policy that maximises its long-term cumulative reward. This paper presents a novel approach which has shown promise in applying accelerated simulation of RL policy training to automating the control of a real robot arm for specific applications. The approach has two steps. First, design space exploration techniques are developed to enhance performance of an FPGA accelerator for RL policy training based on Trust Region Policy Optimisation (TRPO), which results in a 43% speed improvement over a previous FPGA implementation, while achieving 4.65 times speed up against deep learning libraries running on GPU and 19.29 times speed up against CPU. Second, the trained RL policy is transferred to a real robot arm. Our experiments show that the trained arm can successfully reach to and pick up predefined objects, demonstrating the feasibility of our approach.

引用

页码：135 / 142

页数：8

共 50 条

[1] Compiling application-specific hardware
Budiu, M
Goldstein, SC
[J]. FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, PROCEEDINGS: RECONFIGURABLE COMPUTING IS GOING MAINSTREAM, 2002, 2438 : 853 - 863
[2] Reinforcement learning methods based on GPU accelerated industrial control hardware
Alexander Schmidt
Florian Schellroth
Marc Fischer
Lukas Allimant
Oliver Riedel
[J]. Neural Computing and Applications, 2021, 33 : 12191 - 12207
[3] Reinforcement learning methods based on GPU accelerated industrial control hardware
Schmidt, Alexander
Schellroth, Florian
Fischer, Marc
Allimant, Lukas
Riedel, Oliver
[J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (18): : 12191 - 12207
[4] ORCHID: Optimisation of Robotic Control and Hardware In Design using Reinforcement Learning
Jackson, Lucy
Walters, Celyn
Eckersley, Steve
Senior, Pete
Hadfield, Simon
[J]. 2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 4911 - 4917
[5] Hardware reuse in modern application-specific processors and accelerators
Nery, Alexandre S.
Jozwiak, Lech
Lindwer, Menno
Cocco, Mauro
Nedjah, Nadia
Franca, Felipe M. G.
[J]. MICROPROCESSORS AND MICROSYSTEMS, 2013, 37 (6-7) : 684 - 692
[6] Hardware cost estimation for application-specific processor design
Pitkänen, T
Rantanen, T
Cilio, A
Takala, J
[J]. EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, 2005, 3553 : 212 - 221
[7] Towards a miniaturized application-specific Raman spectrometer
Vunckx, Kathleen
Geelen, Bert
Munoz, Victor Garcia
Lee, Woochang
Chang, Hojun
Van Dorpe, Pol
Tilmans, Harrie A.
Nam, Sung Hyun
Lambrechts, Andy
[J]. SENSING FOR AGRICULTURE AND FOOD QUALITY AND SAFETY XII, 2020, 11421
[8] Towards Application-Specific Impact Specifications and GreenSLAs
Atkinson, Colin
Schulze, Thomas
[J]. 2013 2ND INTERNATIONAL WORKSHOP ON GREEN AND SUSTAINABLE SOFTWARE (GREENS), 2013, : 54 - 61
[9] Towards Automated Application-Specific Software Stacks
Davidsson, Nicolai
Pawlowski, Andre
Holz, Thorsten
[J]. COMPUTER SECURITY - ESORICS 2019, PT II, 2019, 11736 : 88 - 109
[10] Concurrent Evolution of Hardware and Software for Application-Specific Microprogrammed Systems
Minarik, Milos
Sekanina, Lukas
[J]. PROCEEDINGS OF THE 2013 IEEE INTERNATIONAL CONFERENCE ON EVOLVABLE SYSTEMS (ICES), 2013, : 43 - 50

← 1 2 3 4 5 →