Q-learning algorithms for constrained Markov decision processes with randomized monotone policies:: Application to MIMO transmission control

被引:59
|
作者
Djonin, Dejan V. [1 ]
Krishnamurthy, Vikram
机构
[1] Dyaptive Inc, Vancouver, BC V6E 4A6, Canada
[2] Univ British Columbia, Dept Elect & Comp Engn, Vancouver, BC V6T 1Z4, Canada
关键词
constrained Markov decision process (CMDP); delay constraints; monotone policies; Q learning; randomized policies; reinforcement learning; supermodularity; transmission scheduling; V-BLAST;
D O I
10.1109/TSP.2007.893228
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper presents novel Q-learning based stochastic control algorithms for rate and power control in V-BLAST transmission systems. The algorithms exploit the supermodularity and monotonic structure results derived in the companion paper. Rate and power control problem is posed as a stochastic optimization problem with the goal of minimizing the average transmission power under the constraint on the average delay that can be interpreted as the quality of service requirement of a given application. Standard Q-learning algorithm is modified to handle the constraints so that it can adaptively learn structured optimal policy for unknown channel/traffic statistics. We discuss the convergence of the proposed algorithms and explore their properties in simulations. To address the issue of unknown transmission costs in an unknown time-varying environment, we propose the variant of Q-learning algorithm in which power costs are estimated in online fashion, and we show that this algorithm converges to the optimal solution as long as the power cost estimates are asymptotically unbiased.
引用
收藏
页码:2170 / 2181
页数:12
相关论文
共 20 条
  • [1] MIMO transmission control in fading channels - A constrained Markov decision process formulation with monotone randomized policies
    Djonin, Dejan V.
    Krishnarnurthy, Vikram
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2007, 55 (10) : 5069 - 5083
  • [2] Safe Q-Learning Method Based on Constrained Markov Decision Processes
    Ge, Yangyang
    Zhu, Fei
    Lin, Xinghong
    Liu, Quan
    IEEE ACCESS, 2019, 7 : 165007 - 165017
  • [3] A Novel Q-learning Algorithm with Function Approximation for Constrained Markov Decision Processes
    Lakshmanan, K.
    Bhatnagar, Shalabh
    2012 50TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2012, : 400 - 405
  • [4] Q-learning for Markov decision processes with a satisfiability criterion
    Shah, Suhail M.
    Borkar, Vivek S.
    SYSTEMS & CONTROL LETTERS, 2018, 113 : 45 - 51
  • [5] Non-randomized policies for constrained Markov decision processes
    Chen, Richard C.
    Feinberg, Eugene A.
    MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2007, 66 (01) : 165 - 179
  • [6] Non-randomized policies for constrained Markov decision processes
    Richard C. Chen
    Eugene A. Feinberg
    Mathematical Methods of Operations Research, 2007, 66 : 165 - 179
  • [7] Risk-aware Q-Learning for Markov Decision Processes
    Huang, Wenjie
    Haskell, William B.
    2017 IEEE 56TH ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2017,
  • [8] On Q-learning Convergence for Non-Markov Decision Processes
    Majeed, Sultan Javed
    Hutter, Marcus
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 2546 - 2552
  • [9] Learning algorithms for finite horizon constrained markov decision processes
    Mittal, A.
    Hemachandra, N.
    JOURNAL OF INDUSTRIAL AND MANAGEMENT OPTIMIZATION, 2007, 3 (03) : 429 - 444
  • [10] A Q-learning algorithm for Markov decision processes with continuous state spaces
    Hu, Jiaqiao
    Yang, Xiangyu
    Hu, Jian-Qiang
    Peng, Yijie
    SYSTEMS & CONTROL LETTERS, 2024, 187