FINDING GEODESICS ON GRAPHS USING REINFORCEMENT LEARNING

被引:1
|
作者
Kious, Daniel [1 ]
Mailler, Cecile [1 ]
Schapira, Bruno [2 ]
机构
[1] Univ Bath, Dept Math Sci, Bath, Avon, England
[2] Aix Marseille Univ, CNRS, Marseille, France
来源
ANNALS OF APPLIED PROBABILITY | 2022年 / 32卷 / 05期
基金
英国工程与自然科学研究理事会;
关键词
Random walks on graphs; linear reinforcement; reinforcement learning; path formation; generalised Polya urns; RANDOM-WALK;
D O I
10.1214/21-AAP1777
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
It is well known in biology that ants are able to find shortest paths between their nest and the food by successive random explorations, without any mean of communication other than the pheromones they leave behind them. This striking phenomenon has been observed experimentally and modelled by different mean-field reinforcement-learning models in the biology literature. In this paper, we introduce the first probabilistic reinforcement-learning model for this phenomenon. In this model, the ants explore a finite graph in which two nodes are distinguished as the nest and the source of food. The ants perform successive random walks on this graph, starting from the nest and stopping when they first reach the food; the transition probabilities of each random walk depend on the realizations of all previous walks through some dynamic weighting of the graph. We discuss different variants of this model based on different reinforcement rules and show that slight changes in this reinforcement rule can lead to drastically different outcomes. We prove that the ants indeed eventually find the shortest path(s) between their nest and the food in two variants of this model and when the underlying graph is, respectively, any series-parallel graph and a five-edge nonseries-parallel losange graph. Both proofs rely on the electrical network method for random walks on weighted graphs and on Rubin's embedding in continuous time. The proof in the series-parallel cases uses the recursive nature of this family of graphs, while the proof in the seemingly simpler losange case turns out to be quite intricate: it relies on a fine analysis of some stochastic approximation, and on various couplings with standard and generalised Polya urns.
引用
收藏
页码:3889 / 3929
页数:41
相关论文
共 50 条
  • [41] Scheduling conditional task graphs with deep reinforcement learning
    Debner, Anton
    Krahn, Maximilian
    Hirvisalo, Vesa
    NORTHERN LIGHTS DEEP LEARNING CONFERENCE, VOL 233, 2024, 233 : 46 - 52
  • [42] FINDING AND USING EXPANDERS IN LOCALLY SPARSE GRAPHS
    Krivelevich, Michael
    SIAM JOURNAL ON DISCRETE MATHEMATICS, 2018, 32 (01) : 611 - 623
  • [43] Multiagent Path Finding Using Deep Reinforcement Learning Coupled With Hot Supervision Contrastive Loss
    Chen, Lin
    Wang, Yaonan
    Mo, Yang
    Miao, Zhiqiang
    Wang, Hesheng
    Feng, Mingtao
    Wang, Sifei
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2023, 70 (07) : 7032 - 7040
  • [44] Finding individual strategies for storage units in electricity market models using deep reinforcement learning
    Harder N.
    Weidlich A.
    Staudt P.
    Energy Informatics, 6 (Suppl 1)
  • [45] Finding the optimal human strategy for Wordle using maximum correct letter probabilities and reinforcement learning
    Anderson, Benton J.
    Meyer, Jesse G.
    arXiv, 2022,
  • [46] Finding Relevant Documents in a Search Engine Using N-Grams Model and Reinforcement Learning
    El Hadi, Amine
    Madani, Youness
    El Ayachi, Rachid
    Erritali, Mohamed
    JOURNAL OF INFORMATION TECHNOLOGY RESEARCH, 2022, 15 (01)
  • [47] Finding the Shortest Path in Stochastic Graphs Using Learning Automata and Adaptive Stochastic Petri Nets
    Vahidipour, S. Mehdi
    Meybodi, Mohammad Reza
    Esnaashari, Mehdi
    INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2017, 25 (03) : 427 - 455
  • [48] A Focus on the Adaptation Phase of Cyber Resilience: Obfuscating Attack Graphs using a Reinforcement Learning Model
    Bouom, Aymar Le Pere Tchimwa
    Lienou, Jean-Pierre
    Nelson, Frederica Free
    Shetty, Sachin
    Geh, Wilson Ejuh
    Kamhoua, Charles
    2023 IEEE FUTURE NETWORKS WORLD FORUM, FNWF, 2024,
  • [49] Multi-hop reasoning over paths in temporal knowledge graphs using reinforcement learning
    Bai, Luyi
    Yu, Wenting
    Chen, Mingzhuo
    Ma, Xiangnan
    APPLIED SOFT COMPUTING, 2021, 103
  • [50] Finding optimal pedagogical content in an adaptive e-learning platform using a new recommendation approach and reinforcement learning
    Madani, Youness
    Ezzikouri, Hanane
    Erritali, Mohammed
    Hssina, Badr
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2020, 11 (10) : 3921 - 3936