An online scalarization multi-objective reinforcement learning algorithm: TOPSIS Q-learning

被引：4

作者：

Mirzanejad, Mohammad ^{[1
]}

Ebrahimi, Morteza ^{[1
]}

Vamplew, Peter ^{[2
]}

Veisi, Hadi ^{[1
]}

机构：

[1] Univ Tehran, Fac New Sci & Technol, Tehran, Iran

[2] Federat Univ Australia, Sch Engn Informat Technol & Phys Sci, Ballarat, Vic, Australia

来源：

KNOWLEDGE ENGINEERING REVIEW | 2022年 / 37卷 / 04期

关键词：

Decision making - E-learning - Learning algorithms;

D O I：

10.1017/S0269888921000163

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Conventional reinforcement learning focuses on problems with single objective. However, many problems have multiple objectives or criteria that may be independent, related, or contradictory. In such cases, multi-objective reinforcement learning is used to propose a compromise among the solutions to balance the objectives. TOPSIS is a multi-criteria decision method that selects the alternative with minimum distance from the positive ideal solution and the maximum distance from the negative ideal solution, so it can be used effectively in the decision-making process to select the next action. In this research a single-policy algorithm called TOPSIS Q-Learning is provided with focus on its performance in online mode. Unlike all single-policy methods, in the first version of the algorithm, there is no need for the user to specify the weights of the objectives. The user's preferences may not be completely definite, so all weight preferences are combined together as decision criteria and a solution is generated by considering all these preferences at once and user can model the uncertainty and weight changes of objectives around their specified preferences of objectives. If the user only wants to apply the algorithm for a specific set of weights the second version of the algorithm efficiently accomplishes that.

引用

页数：29

共 50 条

[21] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Tan, Fuxiao
Yan, Pengfei
Guan, Xinping
NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
[22] Federated multi-objective reinforcement learning
Zhao, Fangyuan
Ren, Xuebin
Yang, Shusen
Zhao, Peng
Zhang, Rui
Xu, Xinxin
INFORMATION SCIENCES, 2023, 624 : 811 - 832
[23] Multi-Objective Optimisation by Reinforcement Learning
Liao, H. L.
Wu, Q. H.
2010 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2010,
[24] Meta-Learning for Multi-objective Reinforcement Learning
Chen, Xi
Ghadirzadeh, Ali
Bjorkman, Marten
Jensfelt, Patric
2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 977 - 983
[25] Multi-objective reinforcement learning based on nonlinear scalarization and long-short-term optimization
Wang, Hongze
ROBOTIC INTELLIGENCE AND AUTOMATION, 2024, 44 (03): : 475 - 487
[26] Hysteretic Q-Learning : an algorithm for decentralized reinforcement learning in cooperative multi-agent teams
Matignon, Laetitia
Laurent, Guillaume J.
Le Fort-Piat, Nadine
2007 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-9, 2007, : 64 - 69
[27] LEARNING MULTI-OBJECTIVE DECEPTION IN A TWO-PLAYER DIFFERENTIAL GAME USING REINFORCEMENT LEARNING AND MULTI-OBJECTIVE GENETIC ALGORITHM
Asgharnia A.
Schwartz H.
Atia M.
International Journal of Innovative Computing, Information and Control, 2022, 18 (06): : 1667 - 1688
[28] Multi-strategy multi-objective differential evolutionary algorithm with reinforcement learning
Han, Yupeng
Peng, Hu
Mei, Changrong
Cao, Lianglin
Deng, Changshou
Wang, Hui
Wu, Zhijian
KNOWLEDGE-BASED SYSTEMS, 2023, 277
[29] An inverse reinforcement learning framework with the Q-learning mechanism for the metaheuristic algorithm
Zhao, Fuqing
Wang, Qiaoyun
Wang, Ling
KNOWLEDGE-BASED SYSTEMS, 2023, 265
[30] Multi-Objective Reinforcement Learning Algorithm and Its Application in Drive System
Zhang Huajun
Zhao Jin
Wang Rui
Ma Tan
IECON 2008: 34TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, VOLS 1-5, PROCEEDINGS, 2008, : 225 - 230

← 1 2 3 4 5 →