FINDING GEODESICS ON GRAPHS USING REINFORCEMENT LEARNING

被引:1
|
作者
Kious, Daniel [1 ]
Mailler, Cecile [1 ]
Schapira, Bruno [2 ]
机构
[1] Univ Bath, Dept Math Sci, Bath, Avon, England
[2] Aix Marseille Univ, CNRS, Marseille, France
来源
ANNALS OF APPLIED PROBABILITY | 2022年 / 32卷 / 05期
基金
英国工程与自然科学研究理事会;
关键词
Random walks on graphs; linear reinforcement; reinforcement learning; path formation; generalised Polya urns; RANDOM-WALK;
D O I
10.1214/21-AAP1777
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
It is well known in biology that ants are able to find shortest paths between their nest and the food by successive random explorations, without any mean of communication other than the pheromones they leave behind them. This striking phenomenon has been observed experimentally and modelled by different mean-field reinforcement-learning models in the biology literature. In this paper, we introduce the first probabilistic reinforcement-learning model for this phenomenon. In this model, the ants explore a finite graph in which two nodes are distinguished as the nest and the source of food. The ants perform successive random walks on this graph, starting from the nest and stopping when they first reach the food; the transition probabilities of each random walk depend on the realizations of all previous walks through some dynamic weighting of the graph. We discuss different variants of this model based on different reinforcement rules and show that slight changes in this reinforcement rule can lead to drastically different outcomes. We prove that the ants indeed eventually find the shortest path(s) between their nest and the food in two variants of this model and when the underlying graph is, respectively, any series-parallel graph and a five-edge nonseries-parallel losange graph. Both proofs rely on the electrical network method for random walks on weighted graphs and on Rubin's embedding in continuous time. The proof in the series-parallel cases uses the recursive nature of this family of graphs, while the proof in the seemingly simpler losange case turns out to be quite intricate: it relies on a fine analysis of some stochastic approximation, and on various couplings with standard and generalised Polya urns.
引用
收藏
页码:3889 / 3929
页数:41
相关论文
共 50 条
  • [1] Efficient, Swarm-Based Path Finding in Unknown Graphs Using Reinforcement Learning
    Aurangzeb, M.
    Lewis, F. L.
    Huber, M.
    2013 10TH IEEE INTERNATIONAL CONFERENCE ON CONTROL AND AUTOMATION (ICCA), 2013, : 870 - 877
  • [2] EFFICIENT, SWARM-BASED PATH FINDING IN UNKNOWN GRAPHS USING REINFORCEMENT LEARNING
    Aurangzeb, Muhammad
    Lewis, Frank L.
    Huber, Manfred
    CONTROL AND INTELLIGENT SYSTEMS, 2014, 42 (03) : 238 - 246
  • [3] Using Knowledge Graphs and Reinforcement Learning for Malware Analysis
    Piplai, Aritran
    Ranade, Priyanka
    Kotal, Anantaa
    Mittal, Sudip
    Narayanan, Sandeep Nair
    Joshi, Anupam
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 2626 - 2633
  • [4] Cooperative Multiagent Reinforcement Learning Using Factor Graphs
    Zhang, Zhen
    Zhao, Dongbin
    PROCEEDINGS OF THE 2013 FOURTH INTERNATIONAL CONFERENCE ON INTELLIGENT CONTROL AND INFORMATION PROCESSING (ICICIP), 2013, : 797 - 802
  • [5] Learning Multiagent Options for Tabular Reinforcement Learning using Factor Graphs
    Chen J.
    Chen J.
    Lan T.
    Aggarwal V.
    IEEE Transactions on Artificial Intelligence, 2023, 4 (05): : 1141 - 1153
  • [6] Path-finding Using Reinforcement Learning and Affective States
    Feldmaier, Johannes
    Diepold, Klaus
    2014 23RD IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (IEEE RO-MAN), 2014, : 543 - 548
  • [7] Reinforcement Learning on Graphs: A Survey
    Nie, Mingshuo
    Chen, Dongming
    Wang, Dongqi
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2023, 7 (04): : 1065 - 1082
  • [8] Reinforcement Learning with Feedback Graphs
    Dann, Christoph
    Mansour, Yishay
    Mohri, Mehryar
    Sekhari, Ayush
    Sridharan, Karthik
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [9] Crown Jewels Analysis using Reinforcement Learning with Attack Graphs
    Gangupantulu, Rohit
    Cody, Tyler
    Rahman, Abdul
    Redino, Christopher
    Clark, Ryan
    Park, Paul
    2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
  • [10] Crown Jewels Analysis using Reinforcement Learning with Attack Graphs
    Gangupantulu, Rohit
    Cody, Tyler
    Rahma, Abdul
    Redino, Christopher
    Clark, Ryan
    Park, Paul
    2021 IEEE Symposium Series on Computational Intelligence, SSCI 2021 - Proceedings, 2021,