Knowledge Infused Policy Gradients with Upper Confidence Bound for Relational Bandits

被引：6

作者：

Roy, Kaushik ^{[1
]}

Zhang, Qi ^{[1
]}

Gaur, Manas ^{[1
]}

Sheth, Amit ^{[1
]}

机构：

[1] Univ South Carolina, Artificial Intelligence Inst, Columbia, SC 29208 USA

来源：

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES | 2021年 / 12975卷

关键词：

D O I：

10.1007/978-3-030-86486-6_3

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Contextual Bandits find important use cases in various real-life scenarios such as online advertising, recommendation systems, healthcare, etc. However, most of the algorithms use flat feature vectors to represent context whereas, in the real world, there is a varying number of objects and relations among them to model in the context. For example, in a music recommendation system, the user context contains what music they listen to, which artists create this music, the artist albums, etc. Adding richer relational context representations also introduces a much larger context space making exploration-exploitation harder. To improve the efficiency of exploration-exploitation knowledge about the context can be infused to guide the exploration-exploitation strategy. Relational context representations allow a natural way for humans to specify knowledge owing to their descriptive nature. We propose an adaptation of Knowledge Infused Policy Gradients to the Contextual Bandit setting and a novel Knowledge Infused Policy Gradients Upper Confidence Bound algorithm and perform an experimental analysis of a simulated music recommendation dataset and various real-life datasets where expert knowledge can drastically reduce the total regret and where it cannot.

引用

页码：35 / 50

页数：16

共 50 条

[21] A Priority Experience Replay Sampling Method Based on Upper Confidence Bound
Ke, Fengkai
Zhao, Daxing
Sun, Guodong
Feng, Wei
ICDLT 2019: 2019 3RD INTERNATIONAL CONFERENCE ON DEEP LEARNING TECHNOLOGIES, 2019, : 38 - 41
[22] Likelihood-Based Confidence Intervals for a Parameter With an Upper or Lower Bound
Pritikin, Joshua N.
Rappaport, Lance M.
Neale, Michael C.
STRUCTURAL EQUATION MODELING-A MULTIDISCIPLINARY JOURNAL, 2017, 24 (03) : 395 - 401
[23] Estimating the maximum expected value through upper confidence bound of likelihood
Imagawa, Takahisa
Kaneko, Tomoyuki
2017 CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI), 2017, : 202 - 207
[24] Hardware implementation of the upper confidence-bound algorithm for reinforcement learning
Radovic, Nevena
Erceg, Milena
COMPUTERS & ELECTRICAL ENGINEERING, 2021, 96
[25] DUCT: An Upper Confidence Bound Approach to Distributed Constraint Optimization Problems
Ottens, Brammert
Dimitrakakis, Christos
Faltings, Boi
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2017, 8 (05)
[26] Pairwise Regression with Upper Confidence Bound for Contextual Bandit with Multiple Actions
Chang, Ya-Hsuan
Lin, Hsuan-Tien
2013 CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI), 2013, : 19 - 24
[27] ON THE IDENTIFICATION AND MITIGATION OF WEAKNESSES IN THE KNOWLEDGE GRADIENT POLICY FOR MULTI-ARMED BANDITS
Edwards, James
Fearnhead, Paul
Glazebrook, Kevin
PROBABILITY IN THE ENGINEERING AND INFORMATIONAL SCIENCES, 2017, 31 (02) : 239 - 263
[28] Maximal Expectation as Upper Confidence Bound for Multi-armed Bandit Problems
Kao, Kuo-Yuan
Chen, I-Hao
2014 IEEE 7TH JOINT INTERNATIONAL INFORMATION TECHNOLOGY AND ARTIFICIAL INTELLIGENCE CONFERENCE (ITAIC), 2014, : 325 - 329
[29] Linear Upper Confidence Bound Algorithm for Contextual Bandit Problem with Piled Rewards
Huang, Kuan-Hao
Lin, Hsuan-Tien
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2016, PT II, 2016, 9652 : 143 - 155
[30] Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem
Zoghi, Masrour
Whiteson, Shimon
Munos, Remi
de Rijke, Maarten
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 10 - 18

← 1 2 3 4 5 →