Multi-armed linear bandits with latent biases

被引:0
|
作者
Kang, Qiyu [1 ]
Tay, Wee Peng [1 ]
She, Rui [1 ]
Wang, Sijie [1 ]
Liu, Xiaoqian [2 ]
Yang, Yuan-Rui [1 ]
机构
[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore
[2] China Univ Polit Sci & Law, Sch Sociol, Beijing, Peoples R China
关键词
Linear bandit; Multi-armed bandit; Latent bias; REWARDS; RECOMMENDATION; MODELS;
D O I
10.1016/j.ins.2024.120103
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In a linear stochastic bandit model, each arm corresponds to a vector in Euclidean space, and the expected return observed at each time step is determined by an unknown linear function of the selected arm. This paper addresses the challenge of identifying the optimal arm in a linear stochastic bandit model, where latent biases corrupt each arm's expected reward. Unlike traditional linear bandit problems, where the observed return directly represents the reward, this paper considers a scenario where the unbiased reward at each time step remains unobservable. This model is particularly relevant in situations where the observed return is influenced by latent biases that need to be carefully excluded from the learning model. For example, in recommendation systems designed to prevent racially discriminatory suggestions, it is crucial to ensure that the users' race does not influence the system. However, the observed return, such as click -through rates, may have already been influenced by racial attributes. In the case where there are finitely many arms, we develop a strategy to achieve O(|������ | log n) regret, where |������ | is the number of arms and n is the number of time steps. In the case where each arm is chosen from an infinite compact set, our strategy achieves O(n2/3(log n)1/2) regret. Experiments verify the efficiency of our strategy.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] Algorithms for Differentially Private Multi-Armed Bandits
    Tossou, Aristide C. Y.
    Dimitrakakis, Christos
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2087 - 2093
  • [42] Combinatorial Multi-armed Bandits for Resource Allocation
    Zuo, Jinhang
    Joe-Wong, Carlee
    2021 55TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2021,
  • [43] TRANSFER LEARNING FOR CONTEXTUAL MULTI-ARMED BANDITS
    Cai, Changxiao
    Cai, T. Tony
    Li, Hongzhe
    ANNALS OF STATISTICS, 2024, 52 (01): : 207 - 232
  • [44] Quantum Reinforcement Learning for Multi-Armed Bandits
    Liu, Yi-Pei
    Li, Kuo
    Cao, Xi
    Jia, Qing-Shan
    Wang, Xu
    2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 5675 - 5680
  • [45] Multi-armed bandits in discrete and continuous time
    Kaspi, H
    Mandelbaum, A
    ANNALS OF APPLIED PROBABILITY, 1998, 8 (04): : 1270 - 1290
  • [46] Multi-armed Bandits with Metric Switching Costs
    Guha, Sudipto
    Munagala, Kamesh
    AUTOMATA, LANGUAGES AND PROGRAMMING, PT II, PROCEEDINGS, 2009, 5556 : 496 - +
  • [47] Multiplayer Modeling via Multi-Armed Bandits
    Gray, Robert C.
    Zhu, Jichen
    Ontanon, Santiago
    2021 IEEE CONFERENCE ON GAMES (COG), 2021, : 695 - 702
  • [48] Survey on Applications of Multi-Armed and Contextual Bandits
    Bouneffouf, Djallel
    Rish, Irina
    Aggarwal, Charu
    2020 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2020,
  • [49] On Interruptible Pure Exploration in Multi-Armed Bandits
    Shleyfman, Alexander
    Komenda, Antonin
    Domshlak, Carmel
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 3592 - 3598
  • [50] Thompson Sampling for Budgeted Multi-armed Bandits
    Xia, Yingce
    Li, Haifang
    Qin, Tao
    Yu, Nenghai
    Liu, Tie-Yan
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 3960 - 3966