Gradient-based policy iteration: An example

被引:0
|
作者
Cao, XR [1 ]
Fang, HT [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Elect & Elect Engn, Kowloon, Hong Kong, Peoples R China
关键词
discrete event dynamic systems; potentials; Poisson equations; W-factors; Q-learning; perturbation analysis; Markov decision processes;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent research indicates that perturbation analysis (PA), Markov decision processes (MDP), and reinforcement learning (RL) are three closely-related areas in discrete event dynamic system optimization. In particular, it was shown that policy iteration in fact chooses the policy that has the steepest performance gradient (provided by PA) for the next iteration. This sensitivity point of view of MDP leads to some new research topics. In this note, we propose to implement policy iteration based on performance gradients. This approach is particularly useful when the actions at different states are correlated and hence the standard policy iteration cannot apply. We illustrate the main ideas with an example of M/G/1/N queue and identify some further research topics.
引用
收藏
页码:3367 / 3371
页数:5
相关论文
共 50 条
  • [21] Gradient-based shape descriptors
    Capar, Abdulkerim
    Kurt, Binnur
    Gokmen, Muhittin
    MACHINE VISION AND APPLICATIONS, 2009, 20 (06) : 365 - 378
  • [22] Gradient-based Sharpness Function
    Rudnaya, Maria
    Mattheij, Robert
    Maubach, Joseph
    ter Morsche, Hennie
    WORLD CONGRESS ON ENGINEERING, WCE 2011, VOL I, 2011, : 301 - 306
  • [23] Deep Deterministic Policy Gradient-Based Edge Caching: An Inherent Performance Tradeoff
    Lei, Meng
    Li, Qiang
    Wu, Rong
    Pandharipande, Ashish
    Ge, Xiaohu
    2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2021,
  • [24] Deep Deterministic Policy Gradient-based Load Balancing Method in SDN Environments
    Jeong, Yunhwan
    Lim, Junghyun
    Choi, Geunkyung
    Roh, Byeong-hee
    2024 INTERNATIONAL CONFERENCE ON SMART APPLICATIONS, COMMUNICATIONS AND NETWORKING, SMARTNETS-2024, 2024,
  • [25] Fuzzy Deep Deterministic Policy Gradient-Based Motion Controller for Humanoid Robot
    Ping-Huan Kuo
    Jun Hu
    Ssu-Ting Lin
    Po-Wei Hsu
    International Journal of Fuzzy Systems, 2022, 24 : 2476 - 2492
  • [26] Course Tracking Control for Smart Ships Based on A Deep Deterministic Policy Gradient-based Algorithm
    Wang, Wei-ye
    Ma, Feng
    Liu, Jialun
    2019 5TH INTERNATIONAL CONFERENCE ON TRANSPORTATION INFORMATION AND SAFETY (ICTIS 2019), 2019, : 1400 - 1404
  • [27] Twin delayed deep deterministic policy gradient-based intelligent computation offloading for IoT
    Siguang Chen
    Bei Tang
    Kun Wang
    Digital Communications and Networks, 2023, 9 (04) : 836 - 845
  • [28] Intrusion Detection in Green Internet of Things: A Deep Deterministic Policy Gradient-Based Algorithm
    Nie, Laisen
    Sun, Wentao
    Wang, Shupeng
    Ning, Zhaolong
    Rodrigues, Joel J. P. C.
    Wu, Yixuan
    Li, Shengtao
    IEEE TRANSACTIONS ON GREEN COMMUNICATIONS AND NETWORKING, 2021, 5 (02): : 778 - 788
  • [29] Twin delayed deep deterministic policy gradient-based intelligent computation offloading for IoT
    Chen, Siguang
    Tang, Bei
    Wang, Kun
    DIGITAL COMMUNICATIONS AND NETWORKS, 2023, 9 (04) : 836 - 845
  • [30] Policy Gradient-Based Core Placement Optimization for Multichip Many-Core Systems
    Myung, Wooshik
    Lee, Donghyun
    Song, Chenhang
    Wang, Guanrui
    Ma, Cheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (08) : 4529 - 4543