A learning search algorithm with propagational reinforcement learning

被引：0

作者：

Zhang, Wei ^{[1
]}

机构：

[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang, Peoples R China

来源：

APPLIED INTELLIGENCE | 2021年 / 51卷 / 11期

关键词：

Machine learning; Heuristic search; Reinforcement learning; Deep neural network; Deep learning; Learning search;

D O I：

10.1007/s10489-020-02117-0

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

When reinforcement learning with a deep neural network is applied to heuristic search, the search becomes a learning search. In a learning search system, there are two key components: (1) a deep neural network with sufficient expression ability as a heuristic function approximator that estimates the distance from any state to a goal; (2) a strategy to guide the interaction of an agent with its environment to obtain more efficient simulated experience to update the Q-value or V-value function of reinforcement learning. To date, neither component has been sufficiently discussed. This study theoretically discusses the size of a deep neural network for approximating a product function of p piecewise multivariate linear functions. The existence of such a deep neural network with O(n + p) layers and O(dn + dnp + dp) neurons has been proven, where d is the number of variables of the multivariate function being approximated, is the approximation error, and n = O(p + log(2)(pd/)). For the second component, this study proposes a general propagational reinforcement-learning-based learning search method that improves the estimate h(.) according to the newly observed distance information about the goals, propagates the improvement bidirectionally in the search tree, and consequently obtains a sequence of more accurate V-values for a sequence of states. Experiments on the maze problems show that our method increases the convergence rate of reinforcement learning by a factor of 2.06 and reduces the number of learning episodes to 1/4 that of other nonpropagating methods.

引用

页码：7990 / 8009

页数：20

共 50 条

[1] A learning search algorithm with propagational reinforcement learning
Wei Zhang
[J]. Applied Intelligence, 2021, 51 : 7990 - 8009
[2] A Direct Policy-Search Algorithm for Relational Reinforcement Learning
Sarjant, Samuel
Pfahringer, Bernhard
Driessens, Kurt
Smith, Tony
[J]. INDUCTIVE LOGIC PROGRAMMING: 23RD INTERNATIONAL CONFERENCE, 2014, 8812 : 76 - 92
[3] Reinforcement Learning based Search (RLS) algorithm in Social Networks
Peyravi, Farzad
Derhami, Vali
Latif, Alimohammad
[J]. 2015 INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING (AISP), 2015, : 206 - 210
[4] A reinforcement learning algorithm to improve scheduling search heuristics with the SVM
Gersmann, K
Hammer, B
[J]. 2004 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2004, : 1811 - 1816
[5] On the Search for Feedback in Reinforcement Learning
Wang, Ran
Parunandi, Karthikeya S.
Sharma, Aayushman
Goyal, Raman
Chakravorty, Suman
[J]. 2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 1560 - 1567
[6] Localizing search in reinforcement learning
Grudic, G
Ungar, L
[J]. SEVENTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-2001) / TWELFTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE (IAAI-2000), 2000, : 590 - 595
[7] Optimization of heuristic search using recursive algorithm selection and reinforcement learning
Vasilikos, Vasileios
Lagoudakis, Michail G.
[J]. ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE, 2010, 60 (1-2) : 119 - 151
[8] Optimization of heuristic search using recursive algorithm selection and reinforcement learning
Vasileios Vasilikos
Michail G. Lagoudakis
[J]. Annals of Mathematics and Artificial Intelligence, 2010, 60 : 119 - 151
[9] RLIRank: Learning to Rank with Reinforcement Learning for Dynamic Search
Zhou, Jianghong
Agichtein, Eugene
[J]. WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, : 2842 - 2848
[10] A genetic algorithm for reinforcement learning
Zhao, L
Liu, ZM
[J]. ICNN - 1996 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS. 1-4, 1996, : 1056 - 1060

← 1 2 3 4 5 →