An online scalarization multi-objective reinforcement learning algorithm: TOPSIS Q-learning

被引:4
|
作者
Mirzanejad, Mohammad [1 ]
Ebrahimi, Morteza [1 ]
Vamplew, Peter [2 ]
Veisi, Hadi [1 ]
机构
[1] Univ Tehran, Fac New Sci & Technol, Tehran, Iran
[2] Federat Univ Australia, Sch Engn Informat Technol & Phys Sci, Ballarat, Vic, Australia
来源
KNOWLEDGE ENGINEERING REVIEW | 2022年 / 37卷 / 04期
关键词
Decision making - E-learning - Learning algorithms;
D O I
10.1017/S0269888921000163
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Conventional reinforcement learning focuses on problems with single objective. However, many problems have multiple objectives or criteria that may be independent, related, or contradictory. In such cases, multi-objective reinforcement learning is used to propose a compromise among the solutions to balance the objectives. TOPSIS is a multi-criteria decision method that selects the alternative with minimum distance from the positive ideal solution and the maximum distance from the negative ideal solution, so it can be used effectively in the decision-making process to select the next action. In this research a single-policy algorithm called TOPSIS Q-Learning is provided with focus on its performance in online mode. Unlike all single-policy methods, in the first version of the algorithm, there is no need for the user to specify the weights of the objectives. The user's preferences may not be completely definite, so all weight preferences are combined together as decision criteria and a solution is generated by considering all these preferences at once and user can model the uncertainty and weight changes of objectives around their specified preferences of objectives. If the user only wants to apply the algorithm for a specific set of weights the second version of the algorithm efficiently accomplishes that.
引用
收藏
页数:29
相关论文
共 50 条
  • [21] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
    Tan, Fuxiao
    Yan, Pengfei
    Guan, Xinping
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
  • [22] Federated multi-objective reinforcement learning
    Zhao, Fangyuan
    Ren, Xuebin
    Yang, Shusen
    Zhao, Peng
    Zhang, Rui
    Xu, Xinxin
    INFORMATION SCIENCES, 2023, 624 : 811 - 832
  • [23] Multi-Objective Optimisation by Reinforcement Learning
    Liao, H. L.
    Wu, Q. H.
    2010 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2010,
  • [24] Meta-Learning for Multi-objective Reinforcement Learning
    Chen, Xi
    Ghadirzadeh, Ali
    Bjorkman, Marten
    Jensfelt, Patric
    2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 977 - 983
  • [25] Multi-objective reinforcement learning based on nonlinear scalarization and long-short-term optimization
    Wang, Hongze
    ROBOTIC INTELLIGENCE AND AUTOMATION, 2024, 44 (03): : 475 - 487
  • [26] Hysteretic Q-Learning : an algorithm for decentralized reinforcement learning in cooperative multi-agent teams
    Matignon, Laetitia
    Laurent, Guillaume J.
    Le Fort-Piat, Nadine
    2007 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-9, 2007, : 64 - 69
  • [27] LEARNING MULTI-OBJECTIVE DECEPTION IN A TWO-PLAYER DIFFERENTIAL GAME USING REINFORCEMENT LEARNING AND MULTI-OBJECTIVE GENETIC ALGORITHM
    Asgharnia A.
    Schwartz H.
    Atia M.
    International Journal of Innovative Computing, Information and Control, 2022, 18 (06): : 1667 - 1688
  • [28] Multi-strategy multi-objective differential evolutionary algorithm with reinforcement learning
    Han, Yupeng
    Peng, Hu
    Mei, Changrong
    Cao, Lianglin
    Deng, Changshou
    Wang, Hui
    Wu, Zhijian
    KNOWLEDGE-BASED SYSTEMS, 2023, 277
  • [29] An inverse reinforcement learning framework with the Q-learning mechanism for the metaheuristic algorithm
    Zhao, Fuqing
    Wang, Qiaoyun
    Wang, Ling
    KNOWLEDGE-BASED SYSTEMS, 2023, 265
  • [30] Multi-Objective Reinforcement Learning Algorithm and Its Application in Drive System
    Zhang Huajun
    Zhao Jin
    Wang Rui
    Ma Tan
    IECON 2008: 34TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, VOLS 1-5, PROCEEDINGS, 2008, : 225 - 230