FINDING GEODESICS ON GRAPHS USING REINFORCEMENT LEARNING

被引：1

作者：

Kious, Daniel ^{[1
]}

Mailler, Cecile ^{[1
]}

Schapira, Bruno ^{[2
]}

机构：

[1] Univ Bath, Dept Math Sci, Bath, Avon, England

[2] Aix Marseille Univ, CNRS, Marseille, France

来源：

ANNALS OF APPLIED PROBABILITY | 2022年 / 32卷 / 05期

基金：

英国工程与自然科学研究理事会;

关键词：

Random walks on graphs; linear reinforcement; reinforcement learning; path formation; generalised Polya urns; RANDOM-WALK;

D O I：

10.1214/21-AAP1777

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

It is well known in biology that ants are able to find shortest paths between their nest and the food by successive random explorations, without any mean of communication other than the pheromones they leave behind them. This striking phenomenon has been observed experimentally and modelled by different mean-field reinforcement-learning models in the biology literature. In this paper, we introduce the first probabilistic reinforcement-learning model for this phenomenon. In this model, the ants explore a finite graph in which two nodes are distinguished as the nest and the source of food. The ants perform successive random walks on this graph, starting from the nest and stopping when they first reach the food; the transition probabilities of each random walk depend on the realizations of all previous walks through some dynamic weighting of the graph. We discuss different variants of this model based on different reinforcement rules and show that slight changes in this reinforcement rule can lead to drastically different outcomes. We prove that the ants indeed eventually find the shortest path(s) between their nest and the food in two variants of this model and when the underlying graph is, respectively, any series-parallel graph and a five-edge nonseries-parallel losange graph. Both proofs rely on the electrical network method for random walks on weighted graphs and on Rubin's embedding in continuous time. The proof in the series-parallel cases uses the recursive nature of this family of graphs, while the proof in the seemingly simpler losange case turns out to be quite intricate: it relies on a fine analysis of some stochastic approximation, and on various couplings with standard and generalised Polya urns.

引用

页码：3889 / 3929

页数：41

共 50 条

[1] Efficient, Swarm-Based Path Finding in Unknown Graphs Using Reinforcement Learning
Aurangzeb, M.
Lewis, F. L.
Huber, M.
2013 10TH IEEE INTERNATIONAL CONFERENCE ON CONTROL AND AUTOMATION (ICCA), 2013, : 870 - 877
[2] EFFICIENT, SWARM-BASED PATH FINDING IN UNKNOWN GRAPHS USING REINFORCEMENT LEARNING
Aurangzeb, Muhammad
Lewis, Frank L.
Huber, Manfred
CONTROL AND INTELLIGENT SYSTEMS, 2014, 42 (03) : 238 - 246
[3] Using Knowledge Graphs and Reinforcement Learning for Malware Analysis
Piplai, Aritran
Ranade, Priyanka
Kotal, Anantaa
Mittal, Sudip
Narayanan, Sandeep Nair
Joshi, Anupam
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 2626 - 2633
[4] Cooperative Multiagent Reinforcement Learning Using Factor Graphs
Zhang, Zhen
Zhao, Dongbin
PROCEEDINGS OF THE 2013 FOURTH INTERNATIONAL CONFERENCE ON INTELLIGENT CONTROL AND INFORMATION PROCESSING (ICICIP), 2013, : 797 - 802
[5] Learning Multiagent Options for Tabular Reinforcement Learning using Factor Graphs
Chen J.
Chen J.
Lan T.
Aggarwal V.
IEEE Transactions on Artificial Intelligence, 2023, 4 (05): : 1141 - 1153
[6] Path-finding Using Reinforcement Learning and Affective States
Feldmaier, Johannes
Diepold, Klaus
2014 23RD IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (IEEE RO-MAN), 2014, : 543 - 548
[7] Reinforcement Learning on Graphs: A Survey
Nie, Mingshuo
Chen, Dongming
Wang, Dongqi
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2023, 7 (04): : 1065 - 1082
[8] Reinforcement Learning with Feedback Graphs
Dann, Christoph
Mansour, Yishay
Mohri, Mehryar
Sekhari, Ayush
Sridharan, Karthik
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[9] Crown Jewels Analysis using Reinforcement Learning with Attack Graphs
Gangupantulu, Rohit
Cody, Tyler
Rahman, Abdul
Redino, Christopher
Clark, Ryan
Park, Paul
2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
[10] Crown Jewels Analysis using Reinforcement Learning with Attack Graphs
Gangupantulu, Rohit
Cody, Tyler
Rahma, Abdul
Redino, Christopher
Clark, Ryan
Park, Paul
2021 IEEE Symposium Series on Computational Intelligence, SSCI 2021 - Proceedings, 2021,

← 1 2 3 4 5 →