How to Explore to Maximize Future Return

被引:0
|
作者
Szepesvari, Csaba [1 ]
机构
[1] Univ Alberta, Dept Comp Sci, Edmonton, AB, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With access to huge-scale distributed systems and more data than ever before, learning systems that learn to make good predictions break yesterday's records on a daily basis. Although prediction problems are important, predicting what to do has its own challenges, which calls for specialized solution methods. In this talk, by means of some examples based on recent work on reinforcement learning, I will illustrate the unique opportunities and challenges that arise when a system must learn to make good decisions to maximize long-term return. In particular, I will start by demonstrating that passive data collection inevitably leads to catastrophic data sparsity in sequential decision making problems (no amount of data is big enough!), while clever algorithms, tailored to this setting, can escape data sparsity, learning essentially arbitrarily faster than what is possible under passive data collection. I will also describe current attempts to scale up such clever algorithms to work on large-scale problems. Amongst the possible approaches, I will discuss the role of sparsity to address this challenge in the practical, yet mathematically elegant setting of "linear bandits". Interestingly, while in the related linear prediction problem, sparsity allows one to deal with huge dimensionality in a seamless fashion, the status of this question in the bandit setting is much less understood.
引用
收藏
页数:1
相关论文
共 50 条
  • [1] INVESTMENT IN QUALITY IMPROVEMENT: HOW TO MAXIMIZE THE RETURN
    Gandjour, Afschin
    [J]. HEALTH ECONOMICS, 2010, 19 (01) : 31 - 42
  • [2] Brochure includes advice on how to maximize return on ISO 9000
    不详
    [J]. QUALITY PROGRESS, 1998, 31 (06) : 16 - 16
  • [3] How Should One Explore the Digital Library of the Future?
    Fox, Edward A.
    Chandrasekar, Prashant
    [J]. Data and Information Management, 2021, 5 (04) : 349 - 362
  • [4] First return, then explore
    Ecoffet, Adrien
    Huizinga, Joost
    Lehman, Joel
    Stanley, Kenneth O.
    Clune, Jeff
    [J]. NATURE, 2021, 590 (7847) : 580 - 586
  • [5] First return, then explore
    Adrien Ecoffet
    Joost Huizinga
    Joel Lehman
    Kenneth O. Stanley
    Jeff Clune
    [J]. Nature, 2021, 590 : 580 - 586
  • [6] Refineries Explore IloT Tools to Maximize Profits
    Jenkins, Scott
    [J]. CHEMICAL ENGINEERING, 2017, 124 (05) : 16 - 20
  • [7] Maximize the return on temp staff investments
    Davidson, L
    [J]. WORKFORCE, 1999, 78 (11): : 58 - 60
  • [8] MAXIMIZE RETURN ON MOST LIMITED RESOURCES
    HOLTSBERRY, A
    [J]. JOURNAL OF SYSTEMS MANAGEMENT, 1985, 36 (12): : 14 - 16
  • [9] Remote makeovers maximize return on spacecraft science
    Feder, Toni
    [J]. PHYSICS TODAY, 2015, 68 (03) : 19 - 21
  • [10] BEYOND THE RETURN ON ADVERTISING: ELASTICITY OF THE RETURN ON ADVERTISING AS A DIAGNOSTIC METRIC TO MAXIMIZE PROFIT
    Mitchell, Ted
    Makienko, Igor
    [J]. IDEAS IN MARKETING: FINDING THE NEW AND POLISHING THE OLD, 2015, : 463 - 463