Discovering symbolic policies with deep reinforcement learning

被引:0
|
作者
Landajuela, Mikel [1 ]
Petersen, Brenden K. [1 ]
Kim, Sookyung [1 ]
Santiago, Claudio P. [1 ]
Glatt, Ruben [1 ]
Mundhenk, T. Nathan [1 ]
Pettit, Jacob F. [1 ]
Faissol, Daniel M. [1 ]
机构
[1] Lawrence Livermore Natl Lab, Livermore, CA 94550 USA
关键词
ARTIFICIAL NEURAL-NETWORKS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep reinforcement learning (DRL) has proven successful for many difficult control problems by learning policies represented by neural networks. However, the complexity of neural network-based policies-involving thousands of composed non-linear operators-can render them problematic to understand, trust, and deploy. In contrast, simple policies comprising short symbolic expressions can facilitate human understanding, while also being transparent and exhibiting predictable behavior. To this end, we propose deep symbolic policy, a novel approach to directly search the space of symbolic policies. We use an auto-regressive recurrent neural network to generate control policies represented by tractable mathematical expressions, employing a risk-seeking policy gradient to maximize performance of the generated policies. To scale to environments with multidimensional action spaces, we propose an "anchoring" algorithm that distills pre-trained neural network-based policies into fully symbolic policies, one action dimension at a time. We also introduce two novel methods to improve exploration in DRL-based combinatorial optimization, building on ideas of entropy regularization and distribution initialization. Despite their dramatically reduced complexity, we demonstrate that discovered symbolic policies outperform seven state-of-the-art DRL algorithms in terms of average rank and average normalized episodic reward across eight benchmark environments.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Discovering Symbolic Policy for Building Control using Reinforcement Learning
    Kim, Soo Kyung
    Song, Chihyeon
    Chen, Weizhe
    Park, Jinkyoo
    Mostafavi, Saman
    [J]. IFAC PAPERSONLINE, 2023, 56 (02): : 1522 - 1527
  • [2] Deep Learning and Symbolic Regression for Discovering Parametric Equations
    Zhang, Michael
    Kim, Samuel
    Lu, Peter Y.
    Soljacic, Marin
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, : 1 - 13
  • [3] Symbolic Task Inference in Deep Reinforcement Learning
    Hasanbeig, Hosein
    Jeppu, Natasha Yogananda
    Abate, Alessandro
    Melham, Tom
    Kroening, Daniel
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2024, 80 : 1099 - 1137
  • [4] Symbolic Task Inference in Deep Reinforcement Learning
    Hasanbeig, Hosein
    Jeppu, Natasha Yogananda
    Abate, Alessandro
    Melham, Tom
    Kroening, Daniel
    [J]. Journal of Artificial Intelligence Research, 2024, 80 : 1099 - 1137
  • [5] Discovering neural policies to drive behaviour by integrating deep reinforcement learning agents with biological neural networks
    Li, Chenguang
    Kreiman, Gabriel
    Ramanathan, Sharad
    [J]. NATURE MACHINE INTELLIGENCE, 2024, : 726 - 738
  • [6] Discovering neural policies to drive behaviour by integrating deep reinforcement learning agents with biological neural networks
    Li, Chenguang
    Kreiman, Gabriel
    Ramanathan, Sharad
    [J]. NATURE MACHINE INTELLIGENCE, 2024, 6 (06) : 726 - 738
  • [7] Deep Symbolic Learning: Discovering Symbols and Rules from Perceptions
    Daniele, Alessandro
    Campari, Tommaso
    Malhotra, Sagar
    Serafini, Luciano
    [J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 3597 - 3605
  • [8] DEEP REINFORCEMENT LEARNING FOR TRANSFER OF CONTROL POLICIES
    Cunningham, James D.
    Miller, Simon W.
    Yukish, Michael A.
    Simpson, Timothy W.
    Tucker, Conrad S.
    [J]. PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2019, VOL 2A, 2020,
  • [9] EDGE: Explaining Deep Reinforcement Learning Policies
    Guo, Wenbo
    Wu, Xian
    Khan, Usmann
    Xing, Xinyu
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [10] Verified Probabilistic Policies for Deep Reinforcement Learning
    Bacci, Edoardo
    Parker, David
    [J]. NASA FORMAL METHODS (NFM 2022), 2022, 13260 : 193 - 212