Knowledge Infused Policy Gradients with Upper Confidence Bound for Relational Bandits

被引:6
|
作者
Roy, Kaushik [1 ]
Zhang, Qi [1 ]
Gaur, Manas [1 ]
Sheth, Amit [1 ]
机构
[1] Univ South Carolina, Artificial Intelligence Inst, Columbia, SC 29208 USA
关键词
D O I
10.1007/978-3-030-86486-6_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Contextual Bandits find important use cases in various real-life scenarios such as online advertising, recommendation systems, healthcare, etc. However, most of the algorithms use flat feature vectors to represent context whereas, in the real world, there is a varying number of objects and relations among them to model in the context. For example, in a music recommendation system, the user context contains what music they listen to, which artists create this music, the artist albums, etc. Adding richer relational context representations also introduces a much larger context space making exploration-exploitation harder. To improve the efficiency of exploration-exploitation knowledge about the context can be infused to guide the exploration-exploitation strategy. Relational context representations allow a natural way for humans to specify knowledge owing to their descriptive nature. We propose an adaptation of Knowledge Infused Policy Gradients to the Contextual Bandit setting and a novel Knowledge Infused Policy Gradients Upper Confidence Bound algorithm and perform an experimental analysis of a simulated music recommendation dataset and various real-life datasets where expert knowledge can drastically reduce the total regret and where it cannot.
引用
收藏
页码:35 / 50
页数:16
相关论文
共 50 条
  • [21] A Priority Experience Replay Sampling Method Based on Upper Confidence Bound
    Ke, Fengkai
    Zhao, Daxing
    Sun, Guodong
    Feng, Wei
    ICDLT 2019: 2019 3RD INTERNATIONAL CONFERENCE ON DEEP LEARNING TECHNOLOGIES, 2019, : 38 - 41
  • [22] Likelihood-Based Confidence Intervals for a Parameter With an Upper or Lower Bound
    Pritikin, Joshua N.
    Rappaport, Lance M.
    Neale, Michael C.
    STRUCTURAL EQUATION MODELING-A MULTIDISCIPLINARY JOURNAL, 2017, 24 (03) : 395 - 401
  • [23] Estimating the maximum expected value through upper confidence bound of likelihood
    Imagawa, Takahisa
    Kaneko, Tomoyuki
    2017 CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI), 2017, : 202 - 207
  • [24] Hardware implementation of the upper confidence-bound algorithm for reinforcement learning
    Radovic, Nevena
    Erceg, Milena
    COMPUTERS & ELECTRICAL ENGINEERING, 2021, 96
  • [25] DUCT: An Upper Confidence Bound Approach to Distributed Constraint Optimization Problems
    Ottens, Brammert
    Dimitrakakis, Christos
    Faltings, Boi
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2017, 8 (05)
  • [26] Pairwise Regression with Upper Confidence Bound for Contextual Bandit with Multiple Actions
    Chang, Ya-Hsuan
    Lin, Hsuan-Tien
    2013 CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI), 2013, : 19 - 24
  • [27] ON THE IDENTIFICATION AND MITIGATION OF WEAKNESSES IN THE KNOWLEDGE GRADIENT POLICY FOR MULTI-ARMED BANDITS
    Edwards, James
    Fearnhead, Paul
    Glazebrook, Kevin
    PROBABILITY IN THE ENGINEERING AND INFORMATIONAL SCIENCES, 2017, 31 (02) : 239 - 263
  • [28] Maximal Expectation as Upper Confidence Bound for Multi-armed Bandit Problems
    Kao, Kuo-Yuan
    Chen, I-Hao
    2014 IEEE 7TH JOINT INTERNATIONAL INFORMATION TECHNOLOGY AND ARTIFICIAL INTELLIGENCE CONFERENCE (ITAIC), 2014, : 325 - 329
  • [29] Linear Upper Confidence Bound Algorithm for Contextual Bandit Problem with Piled Rewards
    Huang, Kuan-Hao
    Lin, Hsuan-Tien
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2016, PT II, 2016, 9652 : 143 - 155
  • [30] Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem
    Zoghi, Masrour
    Whiteson, Shimon
    Munos, Remi
    de Rijke, Maarten
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 10 - 18