A Mutual Information-Based Assessment of Reverse Engineering on Rewards of Reinforcement Learning

被引:0
|
作者
Chen T. [1 ]
Liu J. [1 ]
Baker T. [2 ]
Wu Y. [1 ]
Xiang Y. [1 ]
Li Y. [1 ]
Niu W. [1 ]
Tong E. [1 ]
Zomaya A.Y. [3 ]
机构
[1] Beijing Jiaotong University, Beijing Key Laboratory of Security and Privacy in Intelligent Transportation, Beijing
[2] College of Computing and Informatics, University of Sharjah, Department of Computer Science, Sharjah
[3] The University of Sydney, School of Computer Science, Sydney
来源
基金
中国国家自然科学基金;
关键词
Assessment; mutual information; reinforcement learning (RL); reverse engineering; tensor model;
D O I
10.1109/TAI.2022.3190811
中图分类号
学科分类号
摘要
Rewards are critical hyperparameters in reinforcement learning (RL), since in most cases different reward values will lead to greatly different performance. Due to their commercial value, RL rewards become the target of reverse engineering by the inverse reinforcement learning (IRL) algorithm family. Existing efforts typically utilize two metrics to measure the IRL performance: the expected value difference (EVD) and the mean reward loss (MRL). Unfortunately, in some cases, EVD and MRL can give completely opposite results, due to MRL focusing on whole state-space rewards, while EVD only considering partly sampled rewards. Such situation naturally rises to one fundamental question: whether current metrics and assessment are sufficient and accurate for more general use. Thus, in this article, based on the metric called normalized mutual information of reward clusters (C-NMI), we propose a novel IRL assessment; we aim to fill this research gap by considering a middle-granularity state space between the entire state space and the specific sampling space. We utilize the agglomerative nesting algorithm (AGNES) to control dynamical C-NMI computing via a fourth-order tensor model with injected manipulated trajectories. With such a model, we can uniformly capture different-dimension values of MRL, EVD, and C-NMI, and perform more comprehensive and accurate assessment and analyses. Extensive experiments on several mainstream IRLs are experimented in object world, hence revealing that the assessing accuracy of our method increases 110.13% and 116.59%, respectively, when compared with the EVD and MRL. Meanwhile, C-NMI is more robust than EVD and MRL under different demonstrations. © 2020 IEEE.
引用
收藏
页码:1089 / 1100
页数:11
相关论文
共 50 条
  • [1] Reverse Fingerprinting and Mutual Information-Based Activity Labeling and Scoring (MIBALS)
    Williams, Chris
    Schreyer, Suzanne K.
    COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2009, 12 (04) : 424 - 439
  • [2] Reverse fingerprinting and mutual information-based activity labeling and scoring (MIBALS)
    Schreyer, Suzanne
    Williams, Chris
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2007, 233 : 207 - 207
  • [3] Traffic Prediction With Transfer Learning: A Mutual Information-Based Approach
    Huang, Yunjie
    Song, Xiaozhuang
    Zhu, Yuanshao
    Zhang, Shiyao
    Yu, James J. Q.
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (08) : 8236 - 8252
  • [4] Conditional Mutual Information-Based Generalization Bound for Meta Learning
    Rezazadeh, Arezou
    Jose, Sharu Theresa
    Durisi, Giuseppe
    Simeone, Osvaldo
    2021 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2021, : 1176 - 1181
  • [5] Mutual Information-Based Feature Selection and Ensemble Learning for Classification
    Qi, Chengming
    Zhou, Zhangbing
    Wang, Qun
    Hu, Lishuan
    2016 INTERNATIONAL CONFERENCE ON IDENTIFICATION, INFORMATION AND KNOWLEDGE IN THE INTERNET OF THINGS (IIKI), 2016, : 116 - 121
  • [6] Information-based Incentivisation when Rewards are Inadequate
    Mahmoud, Samhar
    Barakat, Lina
    Miles, Simon
    Taweel, Adel
    Delaney, Brendan
    Luck, Michael
    21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 591 - 596
  • [7] Mutual information-based context quantization
    Cagnazzo, Marco
    Antonini, Marc
    Barlaud, Michel
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2010, 25 (01) : 64 - 74
  • [8] Mutual Information-Based Visual Servoing
    Dame, Amaury
    Marchand, Eric
    IEEE TRANSACTIONS ON ROBOTICS, 2011, 27 (05) : 958 - 969
  • [9] Mutual information-based multi-output tree learning algorithm
    Kang, Hyun-Seok
    Jun, Chi-Hyuck
    INTELLIGENT DATA ANALYSIS, 2021, 25 (06) : 1525 - 1545
  • [10] TibGM: A Transferable and Information-Based Graphical Model Approach for Reinforcement Learning
    Adel, Tameem
    Weller, Adrian
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97