A Mutual Information-Based Assessment of Reverse Engineering on Rewards of Reinforcement Learning

被引:0
|
作者
Chen T. [1 ]
Liu J. [1 ]
Baker T. [2 ]
Wu Y. [1 ]
Xiang Y. [1 ]
Li Y. [1 ]
Niu W. [1 ]
Tong E. [1 ]
Zomaya A.Y. [3 ]
机构
[1] Beijing Jiaotong University, Beijing Key Laboratory of Security and Privacy in Intelligent Transportation, Beijing
[2] College of Computing and Informatics, University of Sharjah, Department of Computer Science, Sharjah
[3] The University of Sydney, School of Computer Science, Sydney
来源
基金
中国国家自然科学基金;
关键词
Assessment; mutual information; reinforcement learning (RL); reverse engineering; tensor model;
D O I
10.1109/TAI.2022.3190811
中图分类号
学科分类号
摘要
Rewards are critical hyperparameters in reinforcement learning (RL), since in most cases different reward values will lead to greatly different performance. Due to their commercial value, RL rewards become the target of reverse engineering by the inverse reinforcement learning (IRL) algorithm family. Existing efforts typically utilize two metrics to measure the IRL performance: the expected value difference (EVD) and the mean reward loss (MRL). Unfortunately, in some cases, EVD and MRL can give completely opposite results, due to MRL focusing on whole state-space rewards, while EVD only considering partly sampled rewards. Such situation naturally rises to one fundamental question: whether current metrics and assessment are sufficient and accurate for more general use. Thus, in this article, based on the metric called normalized mutual information of reward clusters (C-NMI), we propose a novel IRL assessment; we aim to fill this research gap by considering a middle-granularity state space between the entire state space and the specific sampling space. We utilize the agglomerative nesting algorithm (AGNES) to control dynamical C-NMI computing via a fourth-order tensor model with injected manipulated trajectories. With such a model, we can uniformly capture different-dimension values of MRL, EVD, and C-NMI, and perform more comprehensive and accurate assessment and analyses. Extensive experiments on several mainstream IRLs are experimented in object world, hence revealing that the assessing accuracy of our method increases 110.13% and 116.59%, respectively, when compared with the EVD and MRL. Meanwhile, C-NMI is more robust than EVD and MRL under different demonstrations. © 2020 IEEE.
引用
收藏
页码:1089 / 1100
页数:11
相关论文
共 50 条
  • [21] A Mutual Information-based Framework for the Analysis of Information Retrieval Systems
    Golbus, Peter B.
    Aslam, Javed A.
    SIGIR'13: THE PROCEEDINGS OF THE 36TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH & DEVELOPMENT IN INFORMATION RETRIEVAL, 2013, : 683 - 692
  • [22] Accelerating bioactive peptide discovery via mutual information-based meta-learning
    He, Wenjia
    Jiang, Yi
    Jin, Junru
    Li, Zhongshen
    Zhao, Jiaojiao
    Manavalan, Balachandran
    Su, Ran
    Gao, Xin
    Wei, Leyi
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)
  • [23] Mutual information-based label distribution feature selection for multi-label learning
    Qian, Wenbin
    Huang, Jintao
    Wang, Yinglong
    Shu, Wenhao
    KNOWLEDGE-BASED SYSTEMS, 2020, 195
  • [24] WTL-I: Mutual Information-Based Wavelet Transform Learning for Hyperspectral Imaging
    Gehlot, Shiv
    Ansari, Naushad
    Gupta, Anubha
    FRONTIERS IN SIGNAL PROCESSING, 2022, 2
  • [25] Mutual Information Regularized Offline Reinforcement Learning
    Ma, Xiao
    Kang, Bingyi
    Xu, Zhongwen
    Lin, Min
    Yan, Shuicheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [26] Tightening mutual information-based bounds on generalization error
    Bu Y.
    Zou S.
    Veeravalli V.V.
    IEEE Journal on Selected Areas in Information Theory, 2020, 1 (01): : 121 - 130
  • [27] Mutual Information-Based Hierarchies on Warsaw Stock Exchange
    Fiedor, P.
    ACTA PHYSICA POLONICA A, 2015, 127 (3A) : A33 - A37
  • [28] An Approach of Mutual Information-based Characterization of Allosteric Macromolecules
    Louvaris, Drosianos
    Rahman, Nafeesa
    Nasif, Ahmed O.
    2022 IEEE 22ND INTERNATIONAL CONFERENCE ON NANOTECHNOLOGY (NANO), 2022, : 79 - 82
  • [29] Interpolation artefacts in mutual information-based image registration
    Pluim, JPW
    Maintz, JBA
    Viergever, MA
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2000, 77 (02) : 211 - 232
  • [30] Mutual information-based LPI optimisation for radar network
    Shi, Chenguang
    Zhou, Jianjiang
    Wang, Fei
    Chen, Jun
    INTERNATIONAL JOURNAL OF ELECTRONICS, 2015, 102 (07) : 1114 - 1131