Multi-armed linear bandits with latent biases

被引:0
|
作者
Kang, Qiyu [1 ]
Tay, Wee Peng [1 ]
She, Rui [1 ]
Wang, Sijie [1 ]
Liu, Xiaoqian [2 ]
Yang, Yuan-Rui [1 ]
机构
[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore
[2] China Univ Polit Sci & Law, Sch Sociol, Beijing, Peoples R China
关键词
Linear bandit; Multi-armed bandit; Latent bias; REWARDS; RECOMMENDATION; MODELS;
D O I
10.1016/j.ins.2024.120103
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In a linear stochastic bandit model, each arm corresponds to a vector in Euclidean space, and the expected return observed at each time step is determined by an unknown linear function of the selected arm. This paper addresses the challenge of identifying the optimal arm in a linear stochastic bandit model, where latent biases corrupt each arm's expected reward. Unlike traditional linear bandit problems, where the observed return directly represents the reward, this paper considers a scenario where the unbiased reward at each time step remains unobservable. This model is particularly relevant in situations where the observed return is influenced by latent biases that need to be carefully excluded from the learning model. For example, in recommendation systems designed to prevent racially discriminatory suggestions, it is crucial to ensure that the users' race does not influence the system. However, the observed return, such as click -through rates, may have already been influenced by racial attributes. In the case where there are finitely many arms, we develop a strategy to achieve O(|������ | log n) regret, where |������ | is the number of arms and n is the number of time steps. In the case where each arm is chosen from an infinite compact set, our strategy achieves O(n2/3(log n)1/2) regret. Experiments verify the efficiency of our strategy.
引用
收藏
页数:19
相关论文
共 50 条
  • [31] Multi-armed bandits with episode context
    Rosin, Christopher D.
    ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE, 2011, 61 (03) : 203 - 230
  • [32] Introduction to Multi-Armed Bandits Preface
    Slivkins, Aleksandrs
    FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2019, 12 (1-2): : 1 - 286
  • [33] Federated Multi-armed Bandits with Personalization
    Shi, Chengshuai
    Shen, Cong
    Yang, Jing
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [34] OSOM: A simultaneously optimal algorithm for multi-armed and linear contextual bandits
    Chatterji, Niladri S.
    Muthukumar, Vidya
    Bartlett, Peter L.
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 1844 - 1853
  • [35] Medoids in Almost-Linear Time via Multi-Armed Bandits
    Bagaria, Vivek
    Kamath, Govinda M.
    Ntranos, Vasilis
    Zhang, Martin J.
    Tse, David N.
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [36] LEVY BANDITS: MULTI-ARMED BANDITS DRIVEN BY LEVY PROCESSES
    Kaspi, Haya
    Mandelbaum, Avi
    ANNALS OF APPLIED PROBABILITY, 1995, 5 (02): : 541 - 565
  • [37] Successive Reduction of Arms in Multi-Armed Bandits
    Gupta, Neha
    Granmo, Ole-Christoffer
    Agrawala, Ashok
    RESEARCH AND DEVELOPMENT IN INTELLIGENT SYSTEMS XXVIII: INCORPORATING APPLICATIONS AND INNOVATIONS IN INTELLIGENT SYSTEMS XIX, 2011, : 181 - +
  • [38] Quantum greedy algorithms for multi-armed bandits
    Hiroshi Ohno
    Quantum Information Processing, 22
  • [39] Online Multi-Armed Bandits with Adaptive Inference
    Dimakopoulou, Maria
    Ren, Zhimei
    Zhou, Zhengyuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [40] Multi-Armed Bandits for Adaptive Constraint Propagation
    Balafrej, Amine
    Bessiere, Christian
    Paparrizou, Anastasia
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 290 - 296