A reinforcement learning approach to stochastic business games

被引:16
|
作者
Ravulapati, KK [1 ]
Rao, J
Das, TK
机构
[1] Delta Technol, Atlanta, GA 30354 USA
[2] Pilgrim Software, Tampa, FL 33618 USA
[3] Univ S Florida, Dept Ind & Management Syst Engn, Tampa, FL 33620 USA
基金
美国国家科学基金会;
关键词
D O I
10.1080/07408170490278698
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
The Internet revolution has resulted in increased competition among providers of goods and services to lure customers by tearing down the barriers of time and distance. For example, a home buyer shopping for a mortgage loan through the Internet is now a potential customer for a large number of lending institutions throughout the world. The lenders ( players, in generic game theory nomenclature) seeking to capture this customer are involved in a nonzero-sum stochastic game. Stochastic games are among the least studied and understood of the management science problems, and no computationally tractable solution technique is available for multi-player nonzero-sum stochastic games. We now develop a computer-simulation-based machine learning algorithm that can be used to solve nonzero-sum stochastic game problems that are modeled as competitive Markov decision processes. The methodology based on this algorithm is implemented on a supply chain inventory planning problem with a limited state space. The equilibrium reward obtained from the stochastic game problem is compared with a logical upper bound obtained from the corresponding Markov decision problem in which a single decision maker ( player) is substituted for all the competing players in the game. Several numerical versions of the problem are studied to assess the performance of the methodology. The results obtained from our methodology for the inventory planning problems are within 0.8% of the upper bound.
引用
收藏
页码:373 / 385
页数:13
相关论文
共 50 条
  • [1] Online Reinforcement Learning in Stochastic Games
    Wei, Chen-Yu
    Hong, Yi-Te
    Lu, Chi-Jen
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [2] Reinforcement learning with predefined and inferred reward machines in stochastic games
    Hu, Jueming
    Paliwal, Yash
    Kim, Hyohun
    Wang, Yanze
    Xu, Zhe
    [J]. NEUROCOMPUTING, 2024, 599
  • [3] Deterministic limit of temporal difference reinforcement learning for stochastic games
    Barfuss, Wolfram
    Donges, Jonathan F.
    Kurths, Juergen
    [J]. PHYSICAL REVIEW E, 2019, 99 (04)
  • [4] Exploring selfish reinforcement learning in repeated games with stochastic rewards
    Katja Verbeeck
    Ann Nowé
    Johan Parent
    Karl Tuyls
    [J]. Autonomous Agents and Multi-Agent Systems, 2007, 14 : 239 - 269
  • [5] PALO bounds for reinforcement learning in partially observable stochastic games
    Ceren, Roi
    He, Keyang
    Doshi, Prashant
    Banerjee, Bikramjit
    [J]. NEUROCOMPUTING, 2021, 420 : 36 - 56
  • [6] Exploring selfish reinforcement learning in repeated games with stochastic rewards
    Verbeeck, Katja
    Nowe, Ann
    Parent, Johan
    Tuyls, Karl
    [J]. AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2007, 14 (03) : 239 - 269
  • [7] Satisficing Paths and Independent Multiagent Reinforcement Learning in Stochastic Games
    Yongacoglu, Bora
    Arslan, Gurdal
    Yuksel, Serdar
    [J]. SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2023, 5 (03): : 745 - 773
  • [8] An Approach to Interactive Deep Reinforcement Learning for Serious Games
    Dobrovsky, Aline
    Borghoff, Uwe M.
    Hofmann, Marko
    [J]. 2016 7TH IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFOCOMMUNICATIONS (COGINFOCOM), 2016, : 85 - 90
  • [9] Real Time Strategy Games: A Reinforcement Learning Approach
    Sethy, Harshit
    Patel, Amit
    Padmanabhan, Vineet
    [J]. ELEVENTH INTERNATIONAL CONFERENCE ON COMMUNICATION NETWORKS, ICCN 2015/INDIA ELEVENTH INTERNATIONAL CONFERENCE ON DATA MINING AND WAREHOUSING, ICDMW 2015/NDIA ELEVENTH INTERNATIONAL CONFERENCE ON IMAGE AND SIGNAL PROCESSING, ICISP 2015, 2015, 54 : 257 - 264
  • [10] General reinforcement learning in games: A unifying approach.
    Camerer, C
    Ho, TH
    [J]. JOURNAL OF MATHEMATICAL PSYCHOLOGY, 1996, 40 (04) : 373 - 373