Welfare Maximization in Competitive Equilibrium: Reinforcement Learning for Markov Exchange Economy

被引:0
|
作者
Liu, Zhihan [1 ]
Lu, Miao [2 ]
Wang, Zhaoran [1 ]
Jordan, Michael I. [3 ]
Yang, Zhuoran [4 ]
机构
[1] Northwestern Univ, Dept Ind Engn & Management Sci, Evanston, IL 60208 USA
[2] Univ Sci & Technol China, Sch Gifted Young, Hefei, Peoples R China
[3] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
[4] Yale Univ, Dept Stat & Data Sci, New Haven, CT 06520 USA
关键词
RESOURCE-ALLOCATION; STRATEGIES;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study a bilevel economic system, which we refer to as a Markov exchange economy (MEE), from the point of view of multi-agent reinforcement learning (MARL). An MEE involves a central planner and a group of self-interested agents. The goal of the agents is to form a Competitive Equilibrium (CE), where each agent myopically maximizes her own utility at each step. The goal of the central planner is to steer the system so as to maximize social welfare, which is defined as the sum of the utilities of all agents. Working in a setting in which the utility function and the system dynamics are both unknown, we propose to find the socially optimal policy and the CE from data via both online and offline variants of MARL. Concretely, we first devise a novel suboptimality metric specifically tailored to MEE, such that minimizing such a metric certifies globally optimal policies for both the planner and the agents. Second, in the online setting, we propose an algorithm, dubbed as MOLM, which combines the optimism principle for exploration with subgame CE seeking. Our algorithm can readily incorporate general function approximation tools for handling large state spaces and achieves a sublinear regret. Finally, we adapt the algorithm to an offline setting based on the pessimism principle and establish an upper bound on the suboptimality.
引用
收藏
页数:42
相关论文
共 50 条