Towards Hardware Accelerated Reinforcement Learning for Application-Specific Robotic Control

被引:0
|
作者
Shao, Shengjia [1 ]
Tsai, Jason [1 ]
Mysior, Michal [1 ]
Luk, Wayne [1 ]
Chau, Thomas [2 ]
Warren, Alexander [2 ]
Jeppesen, Ben [2 ]
机构
[1] Imperial Coll London, London, England
[2] Intel Corp, Swindon, Wilts, England
基金
欧盟地平线“2020”; 英国工程与自然科学研究理事会;
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement Learning (RL) is an area of machine learning in which an agent interacts with the environment by making sequential decisions. The agent receives reward from the environment based on how good the decisions are and tries to find an optimal decision-making policy that maximises its long-term cumulative reward. This paper presents a novel approach which has shown promise in applying accelerated simulation of RL policy training to automating the control of a real robot arm for specific applications. The approach has two steps. First, design space exploration techniques are developed to enhance performance of an FPGA accelerator for RL policy training based on Trust Region Policy Optimisation (TRPO), which results in a 43% speed improvement over a previous FPGA implementation, while achieving 4.65 times speed up against deep learning libraries running on GPU and 19.29 times speed up against CPU. Second, the trained RL policy is transferred to a real robot arm. Our experiments show that the trained arm can successfully reach to and pick up predefined objects, demonstrating the feasibility of our approach.
引用
收藏
页码:135 / 142
页数:8
相关论文
共 50 条
  • [1] Compiling application-specific hardware
    Budiu, M
    Goldstein, SC
    [J]. FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, PROCEEDINGS: RECONFIGURABLE COMPUTING IS GOING MAINSTREAM, 2002, 2438 : 853 - 863
  • [2] Reinforcement learning methods based on GPU accelerated industrial control hardware
    Alexander Schmidt
    Florian Schellroth
    Marc Fischer
    Lukas Allimant
    Oliver Riedel
    [J]. Neural Computing and Applications, 2021, 33 : 12191 - 12207
  • [3] Reinforcement learning methods based on GPU accelerated industrial control hardware
    Schmidt, Alexander
    Schellroth, Florian
    Fischer, Marc
    Allimant, Lukas
    Riedel, Oliver
    [J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (18): : 12191 - 12207
  • [4] ORCHID: Optimisation of Robotic Control and Hardware In Design using Reinforcement Learning
    Jackson, Lucy
    Walters, Celyn
    Eckersley, Steve
    Senior, Pete
    Hadfield, Simon
    [J]. 2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 4911 - 4917
  • [5] Hardware reuse in modern application-specific processors and accelerators
    Nery, Alexandre S.
    Jozwiak, Lech
    Lindwer, Menno
    Cocco, Mauro
    Nedjah, Nadia
    Franca, Felipe M. G.
    [J]. MICROPROCESSORS AND MICROSYSTEMS, 2013, 37 (6-7) : 684 - 692
  • [6] Hardware cost estimation for application-specific processor design
    Pitkänen, T
    Rantanen, T
    Cilio, A
    Takala, J
    [J]. EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, 2005, 3553 : 212 - 221
  • [7] Towards a miniaturized application-specific Raman spectrometer
    Vunckx, Kathleen
    Geelen, Bert
    Munoz, Victor Garcia
    Lee, Woochang
    Chang, Hojun
    Van Dorpe, Pol
    Tilmans, Harrie A.
    Nam, Sung Hyun
    Lambrechts, Andy
    [J]. SENSING FOR AGRICULTURE AND FOOD QUALITY AND SAFETY XII, 2020, 11421
  • [8] Towards Application-Specific Impact Specifications and GreenSLAs
    Atkinson, Colin
    Schulze, Thomas
    [J]. 2013 2ND INTERNATIONAL WORKSHOP ON GREEN AND SUSTAINABLE SOFTWARE (GREENS), 2013, : 54 - 61
  • [9] Towards Automated Application-Specific Software Stacks
    Davidsson, Nicolai
    Pawlowski, Andre
    Holz, Thorsten
    [J]. COMPUTER SECURITY - ESORICS 2019, PT II, 2019, 11736 : 88 - 109
  • [10] Concurrent Evolution of Hardware and Software for Application-Specific Microprogrammed Systems
    Minarik, Milos
    Sekanina, Lukas
    [J]. PROCEEDINGS OF THE 2013 IEEE INTERNATIONAL CONFERENCE ON EVOLVABLE SYSTEMS (ICES), 2013, : 43 - 50