Multi-armed linear bandits with latent biases

被引：0

作者：

Kang, Qiyu ^{[1
]}

Tay, Wee Peng ^{[1
]}

She, Rui ^{[1
]}

Wang, Sijie ^{[1
]}

Liu, Xiaoqian ^{[2
]}

Yang, Yuan-Rui ^{[1
]}

机构：

[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore

[2] China Univ Polit Sci & Law, Sch Sociol, Beijing, Peoples R China

来源：

INFORMATION SCIENCES | 2024年 / 660卷

关键词：

Linear bandit; Multi-armed bandit; Latent bias; REWARDS; RECOMMENDATION; MODELS;

D O I：

10.1016/j.ins.2024.120103

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In a linear stochastic bandit model, each arm corresponds to a vector in Euclidean space, and the expected return observed at each time step is determined by an unknown linear function of the selected arm. This paper addresses the challenge of identifying the optimal arm in a linear stochastic bandit model, where latent biases corrupt each arm's expected reward. Unlike traditional linear bandit problems, where the observed return directly represents the reward, this paper considers a scenario where the unbiased reward at each time step remains unobservable. This model is particularly relevant in situations where the observed return is influenced by latent biases that need to be carefully excluded from the learning model. For example, in recommendation systems designed to prevent racially discriminatory suggestions, it is crucial to ensure that the users' race does not influence the system. However, the observed return, such as click -through rates, may have already been influenced by racial attributes. In the case where there are finitely many arms, we develop a strategy to achieve O(|�� | log n) regret, where |�� | is the number of arms and n is the number of time steps. In the case where each arm is chosen from an infinite compact set, our strategy achieves O(n2/3(log n)1/2) regret. Experiments verify the efficiency of our strategy.

引用

页数：19

共 50 条

[21] MULTI-ARMED BANDITS AND THE GITTINS INDEX
WHITTLE, P
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1980, 42 (02): : 143 - 149
[22] Multi-armed bandits with switching penalties
Asawa, M
Teneketzis, D
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1996, 41 (03) : 328 - 348
[23] On Optimal Foraging and Multi-armed Bandits
Srivastava, Vaibhav
Reverdy, Paul
Leonard, Naomi E.
2013 51ST ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2013, : 494 - 499
[24] Active Learning in Multi-armed Bandits
Antos, Andras
Grover, Varun
Szepesvari, Csaba
ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2008, 5254 : 287 - +
[25] Multi-Armed Bandits with Cost Subsidy
Sinha, Deeksha
Sankararama, Karthik Abinav
Kazerouni, Abbas
Avadhanula, Vashist
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[26] Multi-Armed Bandits With Correlated Arms
Gupta, Samarth
Chaudhari, Shreyas
Joshi, Gauri
Yagan, Osman
IEEE TRANSACTIONS ON INFORMATION THEORY, 2021, 67 (10) : 6711 - 6732
[27] Batched Multi-armed Bandits Problem
Gao, Zijun
Han, Yanjun
Ren, Zhimei
Zhou, Zhengqing
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[28] Are Multi-Armed Bandits Susceptible to Peeking?
Loecher, Markus
ZAGREB INTERNATIONAL REVIEW OF ECONOMICS & BUSINESS, 2018, 21 (01): : 95 - 104
[29] Secure Outsourcing of Multi-Armed Bandits
Ciucanu, Radu
Lafourcade, Pascal
Lombard-Platet, Marius
Soare, Marta
2020 IEEE 19TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2020), 2020, : 202 - 209
[30] Decentralized Exploration in Multi-Armed Bandits
Feraud, Raphael
Alami, Reda
Laroche, Romain
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97

← 1 2 3 4 5 →