AntNet with Reward-Penalty Reinforcement Learning

被引:21
|
作者
Lalbakhsh, Pooia [1 ]
Zaeri, Bahram [2 ]
Lalbakhsh, Ali [3 ]
Fesharaki, Mehdi N. [4 ]
机构
[1] Islamic Azad Univ, Dept Comp Engn, Borujerd Branch, Borujerd, Lorestan, Iran
[2] Islamic Azad Univ Arak Branch, Young Res Club YRC, Arak, Iran
[3] Islamic Azad Univ Sci & Res Campus, Dept Telecommun Engn, Tehran, Iran
[4] Islamic Azad Univ Sci & Res Campus, Dept Comp Engn, Tehran, Iran
来源
2010 SECOND INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, COMMUNICATION SYSTEMS AND NETWORKS (CICSYN) | 2010年
关键词
Ant colony optimization; AntNet; reward-penalty reinforcement learning; swarm intelligence;
D O I
10.1109/CICSyN.2010.11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The paper deals with a modification in the learning phase of AntNet routing algorithm, which improves the system adaptability in the presence of undesirable events. Unlike most of the ACO algorithms which consider reward-inaction reinforcement learning, the proposed strategy considers both reward and penalty onto the action probabilities. As simulation results show, considering penalty in AntNet routing algorithm increases the exploration towards other possible and sometimes much optimal selections, which leads to a more adaptive strategy. The proposed algorithm also uses a self-monitoring solution called Occurrence-Detection, to sense traffic fluctuations and make decision about the level of undesirability of the current status. The proposed algorithm makes use of the two mentioned strategies to prepare a self-healing version of AntNet routing algorithm to face undesirable and unpredictable traffic conditions.
引用
收藏
页码:17 / 21
页数:5
相关论文
共 50 条
  • [41] Can a Dynamic Reward-Penalty Mechanism Help the Implementation of Renewable Portfolio Standards under Information Asymmetry?
    Xin, Xing
    SYMMETRY-BASEL, 2020, 12 (04):
  • [42] Reward Reports for Reinforcement Learning
    Gilbert, Thomas Krendl
    Lambert, Nathan
    Dean, Sarah
    Zick, Tom
    Snoswell, Aaron
    Mehta, Soham
    PROCEEDINGS OF THE 2023 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, AIES 2023, 2023, : 84 - 130
  • [43] Closed-Loop Supply Chain Coordination under a Reward-Penalty and a Manufacturer's Subsidy Policy
    Kim, Sungki
    Shin, Nina
    Park, Sangwook
    SUSTAINABILITY, 2020, 12 (22) : 1 - 28
  • [44] Closed-loop supply chains under reward-penalty mechanism: Retailer collection and asymmetric information
    Wang, Wenbin
    Zhang, Yu
    Li, Yuanyuan
    Zhao, Xuejuan
    Cheng, Mingbao
    JOURNAL OF CLEANER PRODUCTION, 2017, 142 : 3938 - 3955
  • [45] Reward-Penalty vs. Deposit-Refund: Government Incentive Mechanisms for EV Battery Recycling
    Hao, Hao
    Xu, Wenxian
    Wei, Fangfang
    Wu, Chuanliang
    Xu, Zhaoran
    ENERGIES, 2022, 15 (19)
  • [46] The social-economic-environmental impacts of recycling retired EV batteries under reward-penalty mechanism
    Tang, Yanyan
    Zhang, Qi
    Li, Yaoming
    Li, Hailong
    Pan, Xunzhang
    Mclellan, Benjamin
    APPLIED ENERGY, 2019, 251
  • [47] Reward, motivation, and reinforcement learning
    Dayan, P
    Balleine, BW
    NEURON, 2002, 36 (02) : 285 - 298
  • [48] Pricing and Collecting Decision of a Closed-Loop Supply Chain Under Market Segmentation With Reward-Penalty Mechanism
    Wang, Wenbin
    Zhong, Luosheng
    Quan, Shiyuan
    Liu, Ye
    IEEE ACCESS, 2021, 9 : 167252 - 167266
  • [49] Maximum Margin of Twin Sphere Model via Combined Smooth Reward-Penalty Loss Function with Lower Bound
    Kang Q.
    Zhou S.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2021, 34 (10): : 885 - 897
  • [50] The Reward-Penalty Mechanism in a Closed-Loop Supply Chain with Asymmetric Information of the Third-Party Collector
    Wang, Wenbin
    Lv, Jia
    An, Ni
    Guan, Jie
    Quan, Shiyuan
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021