Minimax-Optimal Policy Learning Under Unobserved Confounding

被引：0

作者：

Kallus, Nathan ^{[1
]}

Zhou, Angela ^{[1
]}

机构：

[1] Cornell Univ, New York, NY 10044 USA

来源：

MANAGEMENT SCIENCE | 2021年 / 67卷 / 05期

基金：

美国国家科学基金会;

关键词：

policy learning; optimization; causal inference; personalized medicine; data-driven decision making; REGRET TREATMENT CHOICE; SENSITIVITY-ANALYSIS; IDENTIFICATION; THERAPY; TRIALS; ISSUES;

D O I：

10.1287/mnsc.2020.3699

中图分类号：

C93 [管理学];

学科分类号：

12 ; 1201 ; 1202 ; 120202 ;

摘要：

We study the problem of learning personalized decision policies from observational data while accounting for possible unobserved confounding. Previous approaches, which assume unconfoundedness, that is, that no unobserved confounders affect both the treatment assignment as well as outcome, can lead to policies that introduce harm rather than benefit when some unobserved confounding is present as is generally the case with observational data. Instead, because policy value and regret may not be point-identifiable, we study a method that minimizes the worst-case estimated regret of a candidate policy against a baseline policy over an uncertainty set for propensity weights that controls the extent of unobserved confounding. We prove generalization guarantees that ensure our policy is safe when applied in practice and in fact obtains the best possible uniform control on the range of all possible population regrets that agree with the possible extent of confounding. We develop efficient algorithmic solutions to compute this minimax-optimal policy. Finally, we assess and compare our methods on synthetic and semisynthetic data. In particular, we consider a case study on personalizing hormone replacement therapy based on observational data, in which we validate our results on a randomized experiment. We demonstrate that hidden confounding can hinder existing policy-learning approaches and lead to unwarranted harm although our robust approach guarantees safety and focuses on well-evidenced improvement, a necessity for making personalized treatment policies learned from observational data reliable in practice.

引用

页码：2870 / 2890

页数：21

共 50 条

[31] MINIMAX-OPTIMAL STRATEGIES FOR THE BEST-CHOICE PROBLEM WHEN A BOUND IS KNOWN FOR THE EXPECTED NUMBER OF OBJECTS
HILL, TP
KENNEDY, DP
[J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1994, 32 (04) : 937 - 951
[32] Interval Estimation of Individual-Level Causal Effects Under Unobserved Confounding
Kallus, Nathan
Mao, Xiaojie
Zhou, Angela
[J]. 22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
[33] E(s2)-optimal and minimax-optimal cyclic supersaturated designs via multi-objective simulated annealing
Koukouvinos, Christos
Mylona, Kalliopi
Simos, Dimitris E.
[J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2008, 138 (06) : 1639 - 1646
[34] Minimax-optimal decoding of movement goals from local field potentials using complex spectral features
Angjelichinoski, Marko
Banerjee, Taposh
Choi, John
Pesaran, Bijan
Tarokh, Vahid
[J]. JOURNAL OF NEURAL ENGINEERING, 2019, 16 (04)
[35] Nearly minimax-optimal rates for noisy sparse phase retrieval via early-stopped mirror descent
Wu, Fan
Rebeschini, Patrick
[J]. INFORMATION AND INFERENCE-A JOURNAL OF THE IMA, 2023, 12 (02) : 633 - 713
[36] Offline Policy Evaluation and Optimization under Confounding
Kausik, Chinmaya
Makar, Maggie
Lu, Yangyi
Wang, Yixin
Tan, Kevin
Tewari, Ambuj
[J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
[37] Learning locally minimax optimal Bayesian networks
Silander, Tomi
Roos, Teemu
Myllymaki, Petri
[J]. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2010, 51 (05) : 544 - 557
[38] Transfer Reinforcement Learning under Unobserved Contextual Information
Zhang, Yan
Zavlanos, Michael M.
[J]. 2020 ACM/IEEE 11TH INTERNATIONAL CONFERENCE ON CYBER-PHYSICAL SYSTEMS (ICCPS 2020), 2020, : 75 - 86
[39] Echo Chambers: Social Learning under Unobserved Heterogeneity
Williams, Cole
[J]. ECONOMIC JOURNAL, 2024, 134 (658): : 837 - 855
[40] Constructing E(s2)-optimal and minimax-optimal k-circulant supersaturated designs via multi-objective tabu search
Gupta, Sudhir
Morales, Luis B.
[J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2012, 142 (06) : 1415 - 1420

← 1 2 3 4 5 →