An online scalarization multi-objective reinforcement learning algorithm: TOPSIS Q-learning

被引：4

作者：

Mirzanejad, Mohammad ^{[1
]}

Ebrahimi, Morteza ^{[1
]}

Vamplew, Peter ^{[2
]}

Veisi, Hadi ^{[1
]}

机构：

[1] Univ Tehran, Fac New Sci & Technol, Tehran, Iran

[2] Federat Univ Australia, Sch Engn Informat Technol & Phys Sci, Ballarat, Vic, Australia

来源：

KNOWLEDGE ENGINEERING REVIEW | 2022年 / 37卷 / 04期

关键词：

Decision making - E-learning - Learning algorithms;

D O I：

10.1017/S0269888921000163

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Conventional reinforcement learning focuses on problems with single objective. However, many problems have multiple objectives or criteria that may be independent, related, or contradictory. In such cases, multi-objective reinforcement learning is used to propose a compromise among the solutions to balance the objectives. TOPSIS is a multi-criteria decision method that selects the alternative with minimum distance from the positive ideal solution and the maximum distance from the negative ideal solution, so it can be used effectively in the decision-making process to select the next action. In this research a single-policy algorithm called TOPSIS Q-Learning is provided with focus on its performance in online mode. Unlike all single-policy methods, in the first version of the algorithm, there is no need for the user to specify the weights of the objectives. The user's preferences may not be completely definite, so all weight preferences are combined together as decision criteria and a solution is generated by considering all these preferences at once and user can model the uncertainty and weight changes of objectives around their specified preferences of objectives. If the user only wants to apply the algorithm for a specific set of weights the second version of the algorithm efficiently accomplishes that.

引用

页数：29

共 50 条

[41] Multi-objective Reinforcement Learning for Responsive Grids
Julien Perez
Cécile Germain-Renaud
Balazs Kégl
Charles Loomis
Journal of Grid Computing, 2010, 8 : 473 - 492
[42] Pedestrian simulation as multi-objective reinforcement learning
Ravichandran, Naresh Balaji
Yang, Fangkai
Peters, Christopher
Lansner, Anders
Herman, Pawel
18TH ACM INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS (IVA'18), 2018, : 307 - 312
[43] Continuous reinforcement learning to adapt multi-objective optimization online for robot motion
Zhang, Kai
McLeod, Sterling
Lee, Minwoo
Xiao, Jing
INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2020, 17 (02)
[44] Fuzzy Q-Learning for generalization of reinforcement learning
Berenji, HR
FUZZ-IEEE '96 - PROCEEDINGS OF THE FIFTH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3, 1996, : 2208 - 2214
[45] Multi-objective fuzzy Q-learning to solve continuous state-action problems
Asgharnia, Amirhossein
Schwartz, Howard
Atia, Mohamed
NEUROCOMPUTING, 2023, 516 : 115 - 132
[46] Q-Learning Based Multi-objective Optimization Routing Strategy in UAVs Deterministic Network
Zhou, Zou
Chen, Longjie
Hu, Yu
Zheng, Fei
Liang, Caisheng
Li, Kelin
PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND NETWORKS, VOL II, CENET 2023, 2024, 1126 : 399 - 408
[47] Multi-objective optimization of radiotherapy: distributed Q-learning and agent-based simulation
Jalalimanesh, Ammar
Haghighi, Hamidreza Shahabi
Ahmadi, Abbas
Hejazian, Hossein
Soltani, Madjid
JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2017, 29 (05) : 1071 - 1086
[48] Deep Reinforcement Learning with Double Q-Learning
van Hasselt, Hado
Guez, Arthur
Silver, David
THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2094 - 2100
[49] Reinforcement learning guidance law of Q-learning
Zhang Q.
Ao B.
Zhang Q.
Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2020, 42 (02): : 414 - 419
[50] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
Wang, Yin-Hao
Li, Tzuu-Hseng S.
Lin, Chih-Jui
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193

← 1 2 3 4 5 →