Knowledge Infused Policy Gradients with Upper Confidence Bound for Relational Bandits

被引：6

作者：

Roy, Kaushik ^{[1
]}

Zhang, Qi ^{[1
]}

Gaur, Manas ^{[1
]}

Sheth, Amit ^{[1
]}

机构：

[1] Univ South Carolina, Artificial Intelligence Inst, Columbia, SC 29208 USA

来源：

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES | 2021年 / 12975卷

关键词：

D O I：

10.1007/978-3-030-86486-6_3

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Contextual Bandits find important use cases in various real-life scenarios such as online advertising, recommendation systems, healthcare, etc. However, most of the algorithms use flat feature vectors to represent context whereas, in the real world, there is a varying number of objects and relations among them to model in the context. For example, in a music recommendation system, the user context contains what music they listen to, which artists create this music, the artist albums, etc. Adding richer relational context representations also introduces a much larger context space making exploration-exploitation harder. To improve the efficiency of exploration-exploitation knowledge about the context can be infused to guide the exploration-exploitation strategy. Relational context representations allow a natural way for humans to specify knowledge owing to their descriptive nature. We propose an adaptation of Knowledge Infused Policy Gradients to the Contextual Bandit setting and a novel Knowledge Infused Policy Gradients Upper Confidence Bound algorithm and perform an experimental analysis of a simulated music recommendation dataset and various real-life datasets where expert knowledge can drastically reduce the total regret and where it cannot.

引用

页码：35 / 50

页数：16

共 50 条

[1] Imitation Upper Confidence Bound for Bandits on a Graph
Lupu, Andrei
Precup, Doina
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 8113 - 8114
[2] Asynchronous Upper Confidence Bound Algorithms for Federated Linear Bandits
Li, Chuanhao
Wang, Hongning
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 6529 - 6553
[3] Multilevel Constrained Bandits: A Hierarchical Upper Confidence Bound Approach with Safety Guarantees
Baheri, Ali
MATHEMATICS, 2025, 13 (01)
[4] Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits
Carpentier, Alexandra
Lazaric, Alessandro
Ghavamzadeh, Mohammad
Munos, Remi
Auer, Peter
ALGORITHMIC LEARNING THEORY, 2011, 6925 : 189 - +
[5] Beta Upper Confidence Bound Policy for the Design of Clinical Trials
Dzhoha, Andrii
Rozora, Iryna
AUSTRIAN JOURNAL OF STATISTICS, 2023, 52 : 26 - 39
[6] Bootstrapping Upper Confidence Bound
Hao, Botao
Abbasi-Yadkori, Yasin
Wen, Zheng
Cheng, Guang
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[7] Scalarized Lower Upper Confidence Bound Algorithm
Drugan, Madalina M.
LEARNING AND INTELLIGENT OPTIMIZATION, LION 9, 2015, 8994 : 229 - 235
[8] Risk-Aware Multi-Armed Bandits With Refined Upper Confidence Bounds
Liu, Xingchi
Derakhshani, Mahsa
Lambotharan, Sangarapillai
van der Schaar, Mihaela
IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 269 - 273
[9] Randomised Gaussian Process Upper Confidence Bound for Bayesian Optimisation
Berk, Julian
Gupta, Sunil
Rana, Santu
Venkatesh, Svetha
PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2284 - 2290
[10] Upper Confidence Bound Learning Approach for Real HF Measurements
Melian-Gutierrez, Laura
Modi, Navikkumar
Moy, Christophe
Perez-Alvarez, Ivan
Bader, Faouzi
Zazo, Santiago
2015 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION WORKSHOP (ICCW), 2015, : 381 - 386

← 1 2 3 4 5 →