Online Convex Optimization in Adversarial Markov Decision Processes

被引:0
|
作者
Rosenberg, Aviv [1 ]
Mansour, Yishay [1 ,2 ]
机构
[1] Tel Aviv Univ, Tel Aviv, Israel
[2] Google Res, Tel Aviv, Israel
基金
以色列科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider online learning in episodic loop-free Markov decision processes (MDPs), where the loss function can change arbitrarily between episodes, and the transition function is not known to the learner. We show (O) over tilde (L vertical bar X vertical bar root vertical bar A vertical bar T) regret bound, where T is the number of episodes, X is the state space, A is the action space, and L is the length of each episode. Our online algorithm is implemented using entropic regularization methodology, which allows to extend the original adversarial MDP model to handle convex performance criteria (different ways to aggregate the losses of a single episode), as well as improve previous regret bounds.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Simple Regret Optimization in Online Planning for Markov Decision Processes
    Feldman, Zohar
    Domshlak, Carmel
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2014, 51 : 165 - 205
  • [2] Online Markov Decision Processes
    Even-Dar, Eyal
    Kakade, Sham M.
    Mansour, Yishay
    [J]. MATHEMATICS OF OPERATIONS RESEARCH, 2009, 34 (03) : 726 - 736
  • [3] A Matrosov Theorem for Adversarial Markov Decision Processes
    Teel, Andrew R.
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2013, 58 (08) : 2142 - 2148
  • [4] COMPUTATIONALLY EFFICIENT ALGORITHMS FOR ONLINE OPTIMIZATION OF MARKOV DECISION-PROCESSES
    JALALI, A
    FERGUSON, MJ
    [J]. AUTOMATICA, 1992, 28 (01) : 107 - 118
  • [5] On the Convex Formulations of Robust Markov Decision Processes
    Grand-Clement, Julien
    Petrik, Marek
    [J]. MATHEMATICS OF OPERATIONS RESEARCH, 2024,
  • [6] Online Convex Optimization for Sequential Decision Processes and Extensive-Form Games
    Farina, Gabriele
    Kroer, Christian
    Sandholm, Tuomas
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 1917 - 1925
  • [7] Learning Adversarial Markov Decision Processes with Delayed Feedback
    Lancewicki, Tal
    Rosenberg, Aviv
    Mansour, Yishay
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7281 - 7289
  • [8] A CONVEX ANALYTIC APPROACH TO MARKOV DECISION-PROCESSES
    BORKAR, VS
    [J]. PROBABILITY THEORY AND RELATED FIELDS, 1988, 78 (04) : 583 - 602
  • [9] Blackwell Online Learning for Markov Decision Processes
    Li, Tao
    Peng, Guanze
    Zhu, Quanyan
    [J]. 2021 55TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2021,
  • [10] Online Learning in Kernelized Markov Decision Processes
    Chowdhury, Sayak Ray
    Gopalan, Aditya
    [J]. 22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89