Discovering symbolic policies with deep reinforcement learning

被引：0

作者：

Landajuela, Mikel ^{[1
]}

Petersen, Brenden K. ^{[1
]}

Kim, Sookyung ^{[1
]}

Santiago, Claudio P. ^{[1
]}

Glatt, Ruben ^{[1
]}

Mundhenk, T. Nathan ^{[1
]}

Pettit, Jacob F. ^{[1
]}

Faissol, Daniel M. ^{[1
]}

机构：

[1] Lawrence Livermore Natl Lab, Livermore, CA 94550 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷

关键词：

ARTIFICIAL NEURAL-NETWORKS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep reinforcement learning (DRL) has proven successful for many difficult control problems by learning policies represented by neural networks. However, the complexity of neural network-based policies-involving thousands of composed non-linear operators-can render them problematic to understand, trust, and deploy. In contrast, simple policies comprising short symbolic expressions can facilitate human understanding, while also being transparent and exhibiting predictable behavior. To this end, we propose deep symbolic policy, a novel approach to directly search the space of symbolic policies. We use an auto-regressive recurrent neural network to generate control policies represented by tractable mathematical expressions, employing a risk-seeking policy gradient to maximize performance of the generated policies. To scale to environments with multidimensional action spaces, we propose an "anchoring" algorithm that distills pre-trained neural network-based policies into fully symbolic policies, one action dimension at a time. We also introduce two novel methods to improve exploration in DRL-based combinatorial optimization, building on ideas of entropy regularization and distribution initialization. Despite their dramatically reduced complexity, we demonstrate that discovered symbolic policies outperform seven state-of-the-art DRL algorithms in terms of average rank and average normalized episodic reward across eight benchmark environments.

引用

页数：11

共 50 条

[1] Discovering Symbolic Policy for Building Control using Reinforcement Learning
Kim, Soo Kyung
Song, Chihyeon
Chen, Weizhe
Park, Jinkyoo
Mostafavi, Saman
[J]. IFAC PAPERSONLINE, 2023, 56 (02): : 1522 - 1527
[2] Deep Learning and Symbolic Regression for Discovering Parametric Equations
Zhang, Michael
Kim, Samuel
Lu, Peter Y.
Soljacic, Marin
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, : 1 - 13
[3] Symbolic Task Inference in Deep Reinforcement Learning
Hasanbeig, Hosein
Jeppu, Natasha Yogananda
Abate, Alessandro
Melham, Tom
Kroening, Daniel
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2024, 80 : 1099 - 1137
[4] Symbolic Task Inference in Deep Reinforcement Learning
Hasanbeig, Hosein
Jeppu, Natasha Yogananda
Abate, Alessandro
Melham, Tom
Kroening, Daniel
[J]. Journal of Artificial Intelligence Research, 2024, 80 : 1099 - 1137
[5] Discovering neural policies to drive behaviour by integrating deep reinforcement learning agents with biological neural networks
Li, Chenguang
Kreiman, Gabriel
Ramanathan, Sharad
[J]. NATURE MACHINE INTELLIGENCE, 2024, : 726 - 738
[6] Discovering neural policies to drive behaviour by integrating deep reinforcement learning agents with biological neural networks
Li, Chenguang
Kreiman, Gabriel
Ramanathan, Sharad
[J]. NATURE MACHINE INTELLIGENCE, 2024, 6 (06) : 726 - 738
[7] Deep Symbolic Learning: Discovering Symbols and Rules from Perceptions
Daniele, Alessandro
Campari, Tommaso
Malhotra, Sagar
Serafini, Luciano
[J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 3597 - 3605
[8] DEEP REINFORCEMENT LEARNING FOR TRANSFER OF CONTROL POLICIES
Cunningham, James D.
Miller, Simon W.
Yukish, Michael A.
Simpson, Timothy W.
Tucker, Conrad S.
[J]. PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2019, VOL 2A, 2020,
[9] EDGE: Explaining Deep Reinforcement Learning Policies
Guo, Wenbo
Wu, Xian
Khan, Usmann
Xing, Xinyu
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[10] Verified Probabilistic Policies for Deep Reinforcement Learning
Bacci, Edoardo
Parker, David
[J]. NASA FORMAL METHODS (NFM 2022), 2022, 13260 : 193 - 212

← 1 2 3 4 5 →