Learning to Escape: Multi-mode Policy Learning for the Traveling Salesmen Problem

被引:0
|
作者
Ha, Myoung Hoon [1 ]
Chi, Seunggeun [2 ]
Lee, Sang Wan [3 ]
机构
[1] Korea Adv Inst Sci & Technol, Ctr Neurosci Inspired AI, Daejeon, South Korea
[2] Purdue Univ, Sch Elect & Comp Engn, W Lafayette, IN 47907 USA
[3] Korea Adv Inst Sci & Technol, Dept Brain Cognit Sci, Daejeon, South Korea
关键词
Traveling Salesmen Problem; Neural Combinatoric Optimization; Deep Reinforcement Learning; Transformer;
D O I
10.1109/EAIS58494.2024.10569999
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The traveling salesmen problem (TSP)-one of the most fundamental NP-hard problems in combinatorial optimization-has received considerable attention owing to its direct applicability to real-world routing. Recent studies on TSP have adopted a deep policy network to learn a stochastic acceptance rule. Despite its success in some cases, the structural and functional complexity of the deep policy networks makes it hard to explore the problem space while performing a local search at the same time. We found in our empirical analyses that searching processes are often stuck in the local region, leading to severe performance degradation. To tackle this issue, we propose a novel method for multi-mode policy learning. In the proposed method, a conventional exploration-exploitation scheme is reformulated as the problem of learning to escape from a local search area to induce exploration. We present a multi-mode Markov decision process, followed by policy and value design for local search and escaping modes. Experimental results show that the performance of the proposed method is superior to that of various baseline models, suggesting that the learned escaping policy allows the model to initiate a new local search in promising regions efficiently.
引用
收藏
页码:107 / 117
页数:11
相关论文
共 50 条
  • [1] Learning agents for the multi-mode project scheduling problem
    Wauters, T.
    Verbeeck, K.
    Vanden Berghe, G.
    De Causmaecker, P.
    JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 2011, 62 (02) : 281 - 290
  • [2] Multi-mode Learning: Not One Learning Mode is Strictly Better than the Other
    Daga, Harshit
    Gavrilovska, Ada
    PROCEEDINGS OF THE 2023 THE 2ND ACM WORKSHOP ON DATA PRIVACY AND FEDERATED LEARNING TECHNOLOGIES FOR MOBILE EDGE NETWORK, FEDEDGE 2023, 2023, : 113 - 118
  • [3] The Performance of Ant System in Solving Multi Traveling Salesmen Problem
    Kencana, Eka N.
    Harini, Ida
    Mayuliana, K.
    4TH INFORMATION SYSTEMS INTERNATIONAL CONFERENCE (ISICO 2017), 2017, 124 : 46 - 52
  • [4] Multiperiod Multi Traveling Salesmen Problem with Time Window Constraints
    Yapicioglu, Haluk
    NETWORKS & SPATIAL ECONOMICS, 2018, 18 (04): : 773 - 801
  • [5] Relative distances approach for multi-traveling salesmen problem
    Erguven, Emre
    Polat, Faruk
    KNOWLEDGE-BASED SYSTEMS, 2024, 300
  • [6] Imperial competitive algorithm with policy learning for the traveling salesman problem
    Meng-Hui Chen
    Shih-Hsin Chen
    Pei-Chann Chang
    Soft Computing, 2017, 21 : 1863 - 1875
  • [7] Imperial competitive algorithm with policy learning for the traveling salesman problem
    Chen, Meng-Hui
    Chen, Shih-Hsin
    Chang, Pei-Chann
    SOFT COMPUTING, 2017, 21 (07) : 1863 - 1875
  • [8] Multi-Mode Learning Supported Model Predictive Control with Guarantees
    Bethge, Johanna
    Morabito, Bruno
    Matschek, Janine
    Findeisen, Rolf
    IFAC PAPERSONLINE, 2018, 51 (20): : 517 - 522
  • [9] Multi-mode trade policy retaliation
    Feinberg, Robert M.
    Nes, Kjersti
    Reynolds, Kara M.
    Schaefer, Aleks
    REVIEW OF WORLD ECONOMICS, 2024,
  • [10] Learning effects on muscle modes and multi-mode postural synergies
    Asaka, Tadayoshi
    Wang, Yun
    Fukushima, Junko
    Latash, Mark L.
    EXPERIMENTAL BRAIN RESEARCH, 2008, 184 (03) : 323 - 338