A heuristic Dyna optimizing algorithm using approximate model representation

被引:0
|
作者
Zhong, Shan [1 ,2 ]
Liu, Quan [1 ,3 ,5 ]
Fu, Qiming [1 ,4 ]
Zhang, Zongzhang [1 ]
Zhu, Fei [1 ]
Gong, Shengrong [1 ,2 ]
机构
[1] School of Computer Science and Technology, Soochow University, Suzhou,Jiangsu,215006, China
[2] School of Computer Science and Engineering, Changshu Institute of Technology, Changshu,Jiangsu,215500, China
[3] Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing,210000, China
[4] College of Electronic & Information Engineering, Suzhou University of Science and Technology, Suzhou,Jiangsu,215006, China
[5] Key Laboratory of Symbol Computation and Knowledge Engineering (Jilin University), Ministry of Education, Changchun,130012, China
关键词
Approximation algorithms - Learning algorithms - Heuristic methods - Optimization;
D O I
10.7544/issn1000-1239.2015.20148160
中图分类号
学科分类号
摘要
In allusion to the problems of reinforcement learning with Dyna-framework, such as slow convergence and inappropriate representation of the environment model, delayed learning of the changed environment and so on, this paper proposes a novel heuristic Dyna optimization algorithm based on approximate model-HDyna-AMR, which approximates Q value function via linear function, and solves the optimal value function by using gradient descent method. HDyna-AMR can be divided into two phases, such as the learning phase and the planning phase. In the former one, the algorithm approximately models the environment by interacting with the environment and records the feature appearing frequency, while in the latter one, the approximated environment model can be used to do the planning with some extra rewards according to the feature appearing frequency. Additionally, the paper proves the convergence of the proposed algorithm theoretically. Experimentally, we apply HDyna-AMR to the extended Boyan Chain problem and Mountain Car problem, and the results show that HDyna-AMR can get the approximately optimal policy in both discrete and continuous state space. Furthermore, compared with Dyna-LAPS (Dyna-style planning with linear approximation and prioritized sweeping) and Sarsa (λ), HDyna-AMR outperforms Dyna-LAPS and Sarsa (λ) in terms of convergence rate, and the robustness to the changed environment. © 2015, Science Press. All right reserved.
引用
收藏
页码:2764 / 2775
相关论文
共 50 条
  • [1] Uniform Representation Model and Approximate Generation Algorithm of Mobile SVG
    Wan Lin
    Hu Weijun
    Chen Chuanbo
    Tu Xudong
    2007 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-15, 2007, : 6476 - 6480
  • [2] Optimizing sheep growth curves using a meta-heuristic algorithm
    Benvenga, Marco Antonio Campos
    Naas, Irenilza de Alencar
    Lima, Nilsa Duarte da Silva
    Santos, Aylpy Renan Dutra
    de Vargas Jr, Fernando Miranda
    TROPICAL ANIMAL HEALTH AND PRODUCTION, 2024, 56 (08)
  • [3] A heuristic algorithm for optimizing postal transportation schedule
    Chen, L
    Wang, GY
    8TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING, VOLS 1-3, PROCEEDING, 2001, : 547 - 550
  • [4] Optimizing design of tube crashworthiness using approximate model technique
    Zhang, Yong
    Lu, Yong
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2010, 38 (09): : 129 - 132
  • [5] A heuristic algorithm for hierarchical representation of form documents
    Duygulu, P
    Atalay, V
    Dincel, E
    FOURTEENTH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1 AND 2, 1998, : 929 - 931
  • [6] An optimizing heuristic algorithm for schedule table of FF system
    Zhou, Y
    Wang, TR
    Yu, HB
    Yuan, MZ
    IECON'03: THE 29TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, VOLS 1 - 3, PROCEEDINGS, 2003, : 617 - 619
  • [7] RAPID HEURISTIC ALGORITHM FOR APPROXIMATE SOLUTION OF TRAVELING SALESMAN PROBLEM
    WIORKOWSKI, JJ
    MCELVAIN, K
    TRANSPORTATION RESEARCH, 1975, 9 (2-3): : 181 - 185
  • [8] Design of approximate-TMR using approximate library and heuristic approaches
    Albandes, I.
    Serrano-Cases, A.
    Martins, M.
    Martinez-Alvarez, A.
    Cuenca-Asensi, S.
    Kastensmidt, F. L.
    MICROELECTRONICS RELIABILITY, 2018, 88-90 : 898 - 902
  • [9] HeuriSPAI: a heuristic sparse approximate inverse preconditioning algorithm on GPU
    Gao, Jiaquan
    Chu, Xinyue
    Wang, Yizhou
    CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING, 2023, 5 (02) : 160 - 170
  • [10] HeuriSPAI: a heuristic sparse approximate inverse preconditioning algorithm on GPU
    Jiaquan Gao
    Xinyue Chu
    Yizhou Wang
    CCF Transactions on High Performance Computing, 2023, 5 : 160 - 170