Regret lower bound and optimal algorithm for high-dimensional contextual linear bandit

被引:2
|
作者
Li, Ke [1 ]
Yang, Yun [1 ]
Narisetty, Naveen N. [1 ]
机构
[1] Univ Illinois, Dept Stat, Champaign, IL 61820 USA
来源
ELECTRONIC JOURNAL OF STATISTICS | 2021年 / 15卷 / 02期
关键词
Contextual linear bandit; high-dimension; minimax regret; sparsity; upper confidence bound; VARIABLE SELECTION;
D O I
10.1214/21-EJS1909
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this paper, we consider the multi-armed bandit problem with high-dimensional features. First, we prove a minimax lower bound, O((log d)(alpha+1/2) T (1-alpha/2) + logT), for the cumulative regret, in terms of horizon T, dimension d and a margin parameter alpha is an element of [0, 1], which controls the separation between the optimal and the sub-optimal arms. This new lower bound unifies existing regret bound results that have different dependencies on T due to the use of different values of margin parameter a explicitly implied by their assumptions. Second, we propose a simple and computationally efficient algorithm inspired by the general Upper Confidence Bound (UCB) strategy that achieves a regret upper bound matching the lower bound. The proposed algorithm uses a properly centered l(1)-ball as the confidence set in contrast to the commonly used ellipsoid confidence set. In addition, the algorithm does not require any forced sampling step and is thereby adaptive to the practically unknown margin parameter. Simulations and a real data analysis are conducted to compare the proposed method with existing ones in the literature.
引用
收藏
页码:5652 / 5695
页数:44
相关论文
共 50 条
  • [11] Asymptotically Optimal Contextual Bandit Algorithm Using Hierarchical Structures
    Neyshabouri, Mohammadreza Mohaghegh
    Gokcesu, Kaan
    Gokcesu, Hakan
    Ozkan, Huseyin
    Kozat, Suleyman Serdar
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (03) : 923 - 937
  • [12] A Smoothed Analysis of the Greedy Algorithm for the Linear Contextual Bandit Problem
    Kannan, Sampath
    Morgenstern, Jamie
    Roth, Aaron
    Waggoner, Bo
    Wu, Zhiwei Steven
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [13] Dynamic Batch Learning in High-Dimensional Sparse Linear Contextual Bandits
    Ren, Zhimei
    Zhou, Zhengyuan
    MANAGEMENT SCIENCE, 2024, 70 (02) : 1315 - 1342
  • [14] Lower bound estimation for a family of high-dimensional sparse covariance matrices
    Li, Huimin
    Liu, Youming
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2024, 22 (02)
  • [15] Optimal regret algorithm for Pseudo-1d Bandit Convex Optimization
    Saha, Aadirupa
    Natarajan, Nagarajan
    Netrapalli, Praneeth
    Jain, Prateek
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [16] Optimal Linear Discriminant Analysis for High-Dimensional Functional Data
    Xue, Kaijie
    Yang, Jin
    Yao, Fang
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (546) : 1055 - 1064
  • [17] Discussion of "High-dimensional autocovariance matrices and optimal linear prediction"
    Wu, Wei Biao
    ELECTRONIC JOURNAL OF STATISTICS, 2015, 9 (01): : 789 - 791
  • [18] Optimal Estimation of Genetic Relatedness in High-Dimensional Linear Models
    Guo, Zijian
    Wang, Wanjie
    Cai, T. Tony
    Li, Hongzhe
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2019, 114 (525) : 358 - 369
  • [19] Rejoinder of "High-dimensional autocovariance matrices and optimal linear prediction"
    McMurry, Timothy L.
    Politis, Dimitris N.
    ELECTRONIC JOURNAL OF STATISTICS, 2015, 9 (01): : 811 - 822
  • [20] An optimal ADP algorithm for a high-dimensional stochastic control problem
    Nascimento, Juliana
    Powell, Warren
    2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, : 52 - +