Knowledge Infused Policy Gradients with Upper Confidence Bound for Relational Bandits

被引:6
|
作者
Roy, Kaushik [1 ]
Zhang, Qi [1 ]
Gaur, Manas [1 ]
Sheth, Amit [1 ]
机构
[1] Univ South Carolina, Artificial Intelligence Inst, Columbia, SC 29208 USA
关键词
D O I
10.1007/978-3-030-86486-6_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Contextual Bandits find important use cases in various real-life scenarios such as online advertising, recommendation systems, healthcare, etc. However, most of the algorithms use flat feature vectors to represent context whereas, in the real world, there is a varying number of objects and relations among them to model in the context. For example, in a music recommendation system, the user context contains what music they listen to, which artists create this music, the artist albums, etc. Adding richer relational context representations also introduces a much larger context space making exploration-exploitation harder. To improve the efficiency of exploration-exploitation knowledge about the context can be infused to guide the exploration-exploitation strategy. Relational context representations allow a natural way for humans to specify knowledge owing to their descriptive nature. We propose an adaptation of Knowledge Infused Policy Gradients to the Contextual Bandit setting and a novel Knowledge Infused Policy Gradients Upper Confidence Bound algorithm and perform an experimental analysis of a simulated music recommendation dataset and various real-life datasets where expert knowledge can drastically reduce the total regret and where it cannot.
引用
收藏
页码:35 / 50
页数:16
相关论文
共 50 条
  • [1] Imitation Upper Confidence Bound for Bandits on a Graph
    Lupu, Andrei
    Precup, Doina
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 8113 - 8114
  • [2] Asynchronous Upper Confidence Bound Algorithms for Federated Linear Bandits
    Li, Chuanhao
    Wang, Hongning
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 6529 - 6553
  • [3] Multilevel Constrained Bandits: A Hierarchical Upper Confidence Bound Approach with Safety Guarantees
    Baheri, Ali
    MATHEMATICS, 2025, 13 (01)
  • [4] Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits
    Carpentier, Alexandra
    Lazaric, Alessandro
    Ghavamzadeh, Mohammad
    Munos, Remi
    Auer, Peter
    ALGORITHMIC LEARNING THEORY, 2011, 6925 : 189 - +
  • [5] Beta Upper Confidence Bound Policy for the Design of Clinical Trials
    Dzhoha, Andrii
    Rozora, Iryna
    AUSTRIAN JOURNAL OF STATISTICS, 2023, 52 : 26 - 39
  • [6] Bootstrapping Upper Confidence Bound
    Hao, Botao
    Abbasi-Yadkori, Yasin
    Wen, Zheng
    Cheng, Guang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [7] Scalarized Lower Upper Confidence Bound Algorithm
    Drugan, Madalina M.
    LEARNING AND INTELLIGENT OPTIMIZATION, LION 9, 2015, 8994 : 229 - 235
  • [8] Risk-Aware Multi-Armed Bandits With Refined Upper Confidence Bounds
    Liu, Xingchi
    Derakhshani, Mahsa
    Lambotharan, Sangarapillai
    van der Schaar, Mihaela
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 269 - 273
  • [9] Randomised Gaussian Process Upper Confidence Bound for Bayesian Optimisation
    Berk, Julian
    Gupta, Sunil
    Rana, Santu
    Venkatesh, Svetha
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2284 - 2290
  • [10] Upper Confidence Bound Learning Approach for Real HF Measurements
    Melian-Gutierrez, Laura
    Modi, Navikkumar
    Moy, Christophe
    Perez-Alvarez, Ivan
    Bader, Faouzi
    Zazo, Santiago
    2015 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION WORKSHOP (ICCW), 2015, : 381 - 386