An online scalarization multi-objective reinforcement learning algorithm: TOPSIS Q-learning

被引:4
|
作者
Mirzanejad, Mohammad [1 ]
Ebrahimi, Morteza [1 ]
Vamplew, Peter [2 ]
Veisi, Hadi [1 ]
机构
[1] Univ Tehran, Fac New Sci & Technol, Tehran, Iran
[2] Federat Univ Australia, Sch Engn Informat Technol & Phys Sci, Ballarat, Vic, Australia
来源
KNOWLEDGE ENGINEERING REVIEW | 2022年 / 37卷 / 04期
关键词
Decision making - E-learning - Learning algorithms;
D O I
10.1017/S0269888921000163
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Conventional reinforcement learning focuses on problems with single objective. However, many problems have multiple objectives or criteria that may be independent, related, or contradictory. In such cases, multi-objective reinforcement learning is used to propose a compromise among the solutions to balance the objectives. TOPSIS is a multi-criteria decision method that selects the alternative with minimum distance from the positive ideal solution and the maximum distance from the negative ideal solution, so it can be used effectively in the decision-making process to select the next action. In this research a single-policy algorithm called TOPSIS Q-Learning is provided with focus on its performance in online mode. Unlike all single-policy methods, in the first version of the algorithm, there is no need for the user to specify the weights of the objectives. The user's preferences may not be completely definite, so all weight preferences are combined together as decision criteria and a solution is generated by considering all these preferences at once and user can model the uncertainty and weight changes of objectives around their specified preferences of objectives. If the user only wants to apply the algorithm for a specific set of weights the second version of the algorithm efficiently accomplishes that.
引用
收藏
页数:29
相关论文
共 50 条
  • [1] Accelerated multi-objective task learning using modified Q-learning algorithm
    Rajamohan, Varun Prakash
    Jagatheesaperumal, Senthil Kumar
    INTERNATIONAL JOURNAL OF AD HOC AND UBIQUITOUS COMPUTING, 2024, 47 (01) : 28 - 37
  • [2] Multi-objective route recommendation method based on Q-learning algorithm
    Yu, Qingying
    Xiao, Zhenxing
    Yang, Feng
    Gong, Shan
    Shi, Gege
    Chen, Chuanming
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (04) : 7009 - 7025
  • [3] Cognitive networks QoS multi-objective strategy based on Q-learning algorithm
    Wang, B. (wangbowx@163.com), 1600, Advanced Institute of Convergence Information Technology, Myoungbo Bldg 3F,, Bumin-dong 1-ga, Seo-gu, Busan, 602-816, Korea, Republic of (07):
  • [4] Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning
    Horie, Naoto
    Matsui, Tohgoroh
    Moriyama, Koichi
    Mutoh, Atsuko
    Inuzuka, Nobuhiro
    ARTIFICIAL LIFE AND ROBOTICS, 2019, 24 (03) : 352 - 359
  • [5] Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning
    Naoto Horie
    Tohgoroh Matsui
    Koichi Moriyama
    Atsuko Mutoh
    Nobuhiro Inuzuka
    Artificial Life and Robotics, 2019, 24 : 352 - 359
  • [6] A Multi-objective Reinforcement Learning Algorithm for JS']JSSP
    Mendez-Hernandez, Beatriz M.
    Rodriguez-Bazan, Erick D.
    Martinez-Jimenez, Yailen
    Libin, Pieter
    Nowe, Ann
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: THEORETICAL NEURAL COMPUTATION, PT I, 2019, 11727 : 567 - 584
  • [7] Decomposition based Multi-Objective Evolutionary Algorithm in XCS for Multi-Objective Reinforcement Learning
    Cheng, Xiu
    Browne, Will N.
    Zhang, Mengjie
    2018 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2018, : 622 - 629
  • [8] A Novel Multi-Objective Deep Q-Network: Addressing Immediate and Delayed Rewards in Multi-Objective Q-Learning
    Zhang, Youming
    IEEE Access, 2024, 12 : 144932 - 144949
  • [9] Multi-objective virtual network embedding algorithm based on Q-learning and curiosity-driven
    He, Mengyang
    Zhuang, Lei
    Tian, Shuaikui
    Wang, Guoqing
    Zhang, Kunli
    EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2018,
  • [10] Decomposed Multi-objective Method Based on Q-Learning for Solving Multi-objective Combinatorial Optimization Problem
    Yang, Anju
    Liu, Yuan
    Zou, Juan
    Yang, Shengxiang
    BIO-INSPIRED COMPUTING: THEORIES AND APPLICATIONS, PT 1, BIC-TA 2023, 2024, 2061 : 59 - 73