Learning to Escape: Multi-mode Policy Learning for the Traveling Salesmen Problem

被引:0
|
作者
Ha, Myoung Hoon [1 ]
Chi, Seunggeun [2 ]
Lee, Sang Wan [3 ]
机构
[1] Korea Adv Inst Sci & Technol, Ctr Neurosci Inspired AI, Daejeon, South Korea
[2] Purdue Univ, Sch Elect & Comp Engn, W Lafayette, IN 47907 USA
[3] Korea Adv Inst Sci & Technol, Dept Brain Cognit Sci, Daejeon, South Korea
关键词
Traveling Salesmen Problem; Neural Combinatoric Optimization; Deep Reinforcement Learning; Transformer;
D O I
10.1109/EAIS58494.2024.10569999
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The traveling salesmen problem (TSP)-one of the most fundamental NP-hard problems in combinatorial optimization-has received considerable attention owing to its direct applicability to real-world routing. Recent studies on TSP have adopted a deep policy network to learn a stochastic acceptance rule. Despite its success in some cases, the structural and functional complexity of the deep policy networks makes it hard to explore the problem space while performing a local search at the same time. We found in our empirical analyses that searching processes are often stuck in the local region, leading to severe performance degradation. To tackle this issue, we propose a novel method for multi-mode policy learning. In the proposed method, a conventional exploration-exploitation scheme is reformulated as the problem of learning to escape from a local search area to induce exploration. We present a multi-mode Markov decision process, followed by policy and value design for local search and escaping modes. Experimental results show that the performance of the proposed method is superior to that of various baseline models, suggesting that the learned escaping policy allows the model to initiate a new local search in promising regions efficiently.
引用
收藏
页码:107 / 117
页数:11
相关论文
共 50 条
  • [21] Separation of multi-mode surface waves by supervised machine learning methods
    Li, Jing
    Chen, Yuqing
    Schuster, Gerard T.
    GEOPHYSICAL PROSPECTING, 2020, 68 (04) : 1270 - 1280
  • [22] AMARL: An Attention-Based Multiagent Reinforcement Learning Approach to the Min-Max Multiple Traveling Salesmen Problem
    Gao, Hao
    Zhou, Xing
    Xu, Xin
    Lan, Yixing
    Xiao, Yongqian
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (07) : 9758 - 9772
  • [23] Evolutionary Algorithm for the k-Interconnected Multi-Depot Multi-Traveling Salesmen Problem
    Andrade, Carlos E.
    Miyazawa, Flavio K.
    Resende, Mauricio G. C.
    GECCO'13: PROCEEDINGS OF THE 2013 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2013, : 463 - 470
  • [24] The Machine Learning and Traveling Repairman Problem
    Tulabandhula, Theja
    Rudin, Cynthia
    Jaillet, Patrick
    ALGORITHMIC DECISION THEORY, 2011, 6992 : 262 - 276
  • [25] Mirror Adaptive Impedance Control of Multi-Mode Soft Exoskeleton With Reinforcement Learning
    Xu, Jiajun
    Huang, Kaizhen
    Zhang, Tianyi
    Zhao, Mengcheng
    Ji, Aihong
    Li, Youfu
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 : 6773 - 6785
  • [26] Multi-mode Light: Learning Special Collaboration Patterns for Traffic Signal Control
    Chen, Zhi
    Zhao, Shengjie
    Deng, Hao
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT II, 2022, 13530 : 63 - 74
  • [27] Adaptive Multi-Mode Routing Algorithm for FANET Based on Deep Reinforcement Learning
    Huang, Kai
    Qiu, Xiulin
    Yin, Jun
    Yang, Yuwang
    Computer Engineering and Applications, 2023, 59 (14) : 268 - 274
  • [28] Construction of Multi-mode Affective Learning System: Taking Affective Design as an Example
    Lin, Hao-Chiang Koong
    Su, Sheng-Hsiung
    Chao, Ching-Ju
    Hsieh, Cheng-Yen
    Tsai, Shang-Chin
    EDUCATIONAL TECHNOLOGY & SOCIETY, 2016, 19 (02): : 132 - 147
  • [29] A data-driven meta-learning recommendation model for multi-mode resource constrained project scheduling problem
    Chu, Xianghua
    Li, Shuxiang
    Gao, Fei
    Cui, Can
    Pfeiffer, Forest
    Cui, Jianshuang
    COMPUTERS & OPERATIONS RESEARCH, 2023, 157
  • [30] Batch active learning for time-series classification with multi-mode exploration
    Lee, Sangho
    Choi, Chihyeon
    Do, Hyungrok
    Son, Youngdoo
    INFORMATION SCIENCES, 2025, 711