FINDING GEODESICS ON GRAPHS USING REINFORCEMENT LEARNING

被引：1

作者：

Kious, Daniel ^{[1
]}

Mailler, Cecile ^{[1
]}

Schapira, Bruno ^{[2
]}

机构：

[1] Univ Bath, Dept Math Sci, Bath, Avon, England

[2] Aix Marseille Univ, CNRS, Marseille, France

来源：

ANNALS OF APPLIED PROBABILITY | 2022年 / 32卷 / 05期

基金：

英国工程与自然科学研究理事会;

关键词：

Random walks on graphs; linear reinforcement; reinforcement learning; path formation; generalised Polya urns; RANDOM-WALK;

D O I：

10.1214/21-AAP1777

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

It is well known in biology that ants are able to find shortest paths between their nest and the food by successive random explorations, without any mean of communication other than the pheromones they leave behind them. This striking phenomenon has been observed experimentally and modelled by different mean-field reinforcement-learning models in the biology literature. In this paper, we introduce the first probabilistic reinforcement-learning model for this phenomenon. In this model, the ants explore a finite graph in which two nodes are distinguished as the nest and the source of food. The ants perform successive random walks on this graph, starting from the nest and stopping when they first reach the food; the transition probabilities of each random walk depend on the realizations of all previous walks through some dynamic weighting of the graph. We discuss different variants of this model based on different reinforcement rules and show that slight changes in this reinforcement rule can lead to drastically different outcomes. We prove that the ants indeed eventually find the shortest path(s) between their nest and the food in two variants of this model and when the underlying graph is, respectively, any series-parallel graph and a five-edge nonseries-parallel losange graph. Both proofs rely on the electrical network method for random walks on weighted graphs and on Rubin's embedding in continuous time. The proof in the series-parallel cases uses the recursive nature of this family of graphs, while the proof in the seemingly simpler losange case turns out to be quite intricate: it relies on a fine analysis of some stochastic approximation, and on various couplings with standard and generalised Polya urns.

引用

页码：3889 / 3929

页数：41

共 50 条

[41] Scheduling conditional task graphs with deep reinforcement learning
Debner, Anton
Krahn, Maximilian
Hirvisalo, Vesa
NORTHERN LIGHTS DEEP LEARNING CONFERENCE, VOL 233, 2024, 233 : 46 - 52
[42] FINDING AND USING EXPANDERS IN LOCALLY SPARSE GRAPHS
Krivelevich, Michael
SIAM JOURNAL ON DISCRETE MATHEMATICS, 2018, 32 (01) : 611 - 623
[43] Multiagent Path Finding Using Deep Reinforcement Learning Coupled With Hot Supervision Contrastive Loss
Chen, Lin
Wang, Yaonan
Mo, Yang
Miao, Zhiqiang
Wang, Hesheng
Feng, Mingtao
Wang, Sifei
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2023, 70 (07) : 7032 - 7040
[44] Finding individual strategies for storage units in electricity market models using deep reinforcement learning
Harder N.
Weidlich A.
Staudt P.
Energy Informatics, 6 (Suppl 1)
[45] Finding the optimal human strategy for Wordle using maximum correct letter probabilities and reinforcement learning
Anderson, Benton J.
Meyer, Jesse G.
arXiv, 2022,
[46] Finding Relevant Documents in a Search Engine Using N-Grams Model and Reinforcement Learning
El Hadi, Amine
Madani, Youness
El Ayachi, Rachid
Erritali, Mohamed
JOURNAL OF INFORMATION TECHNOLOGY RESEARCH, 2022, 15 (01)
[47] Finding the Shortest Path in Stochastic Graphs Using Learning Automata and Adaptive Stochastic Petri Nets
Vahidipour, S. Mehdi
Meybodi, Mohammad Reza
Esnaashari, Mehdi
INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2017, 25 (03) : 427 - 455
[48] A Focus on the Adaptation Phase of Cyber Resilience: Obfuscating Attack Graphs using a Reinforcement Learning Model
Bouom, Aymar Le Pere Tchimwa
Lienou, Jean-Pierre
Nelson, Frederica Free
Shetty, Sachin
Geh, Wilson Ejuh
Kamhoua, Charles
2023 IEEE FUTURE NETWORKS WORLD FORUM, FNWF, 2024,
[49] Multi-hop reasoning over paths in temporal knowledge graphs using reinforcement learning
Bai, Luyi
Yu, Wenting
Chen, Mingzhuo
Ma, Xiangnan
APPLIED SOFT COMPUTING, 2021, 103
[50] Finding optimal pedagogical content in an adaptive e-learning platform using a new recommendation approach and reinforcement learning
Madani, Youness
Ezzikouri, Hanane
Erritali, Mohammed
Hssina, Badr
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2020, 11 (10) : 3921 - 3936

← 1 2 3 4 5 →