A Mutual Information-Based Assessment of Reverse Engineering on Rewards of Reinforcement Learning

被引:0
|
作者
Chen T. [1 ]
Liu J. [1 ]
Baker T. [2 ]
Wu Y. [1 ]
Xiang Y. [1 ]
Li Y. [1 ]
Niu W. [1 ]
Tong E. [1 ]
Zomaya A.Y. [3 ]
机构
[1] Beijing Jiaotong University, Beijing Key Laboratory of Security and Privacy in Intelligent Transportation, Beijing
[2] College of Computing and Informatics, University of Sharjah, Department of Computer Science, Sharjah
[3] The University of Sydney, School of Computer Science, Sydney
来源
基金
中国国家自然科学基金;
关键词
Assessment; mutual information; reinforcement learning (RL); reverse engineering; tensor model;
D O I
10.1109/TAI.2022.3190811
中图分类号
学科分类号
摘要
Rewards are critical hyperparameters in reinforcement learning (RL), since in most cases different reward values will lead to greatly different performance. Due to their commercial value, RL rewards become the target of reverse engineering by the inverse reinforcement learning (IRL) algorithm family. Existing efforts typically utilize two metrics to measure the IRL performance: the expected value difference (EVD) and the mean reward loss (MRL). Unfortunately, in some cases, EVD and MRL can give completely opposite results, due to MRL focusing on whole state-space rewards, while EVD only considering partly sampled rewards. Such situation naturally rises to one fundamental question: whether current metrics and assessment are sufficient and accurate for more general use. Thus, in this article, based on the metric called normalized mutual information of reward clusters (C-NMI), we propose a novel IRL assessment; we aim to fill this research gap by considering a middle-granularity state space between the entire state space and the specific sampling space. We utilize the agglomerative nesting algorithm (AGNES) to control dynamical C-NMI computing via a fourth-order tensor model with injected manipulated trajectories. With such a model, we can uniformly capture different-dimension values of MRL, EVD, and C-NMI, and perform more comprehensive and accurate assessment and analyses. Extensive experiments on several mainstream IRLs are experimented in object world, hence revealing that the assessing accuracy of our method increases 110.13% and 116.59%, respectively, when compared with the EVD and MRL. Meanwhile, C-NMI is more robust than EVD and MRL under different demonstrations. © 2020 IEEE.
引用
收藏
页码:1089 / 1100
页数:11
相关论文
共 50 条
  • [41] CONDITIONAL DYNAMIC MUTUAL INFORMATION-BASED FEATURE SELECTION
    Liu, Huawen
    Mo, Yuchang
    Zhao, Jianmin
    COMPUTING AND INFORMATICS, 2012, 31 (06) : 1193 - 1216
  • [42] Reinforcement learning with pattern-based rewards
    Peters, JF
    Henry, C
    Ramanna, S
    PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2005, : 267 - 272
  • [43] A Comparison of Mutual and Fuzzy-Mutual Information-Based Feature Selection Strategies
    Tsai, Yu-Shuen
    Yang, Ueng-Cheng
    Chung, I-Fang
    Huang, Chuen-Der
    2013 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ - IEEE 2013), 2013,
  • [44] Modified Mutual Information-based Feature Selection for Intrusion Detection Systems in Decision Tree Learning
    Song, Jingping
    Zhu, Zhiliang
    Scully, Peter
    Price, Chris
    JOURNAL OF COMPUTERS, 2014, 9 (07) : 1542 - 1546
  • [45] Information-based engineering design and the ABET 2000
    Zhang, G
    28TH ANNUAL FRONTIERS IN EDUCATION CONFERENCE - CONFERENCE PROCEEDINGS, VOLS 1-3, 1998, : 884 - 889
  • [46] A Mutual Information-Based Self-Supervised Learning Model for PolSAR Land Cover Classification
    Ren, Bo
    Zhao, Yangyang
    Hou, Biao
    Chanussot, Jocelyn
    Jiao, Licheng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (11): : 9224 - 9237
  • [47] Intelligent robotic sonographer: Mutual information-based disentangled reward learning from few demonstrations
    Jiang, Zhongliang
    Bi, Yuan
    Zhou, Mingchuan
    Hu, Ying
    Burke, Michael
    Navab, Nassir
    INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2024, 43 (07): : 981 - 1002
  • [48] Mutual information-based rigid and nonrigid registration of ultrasound volumes
    Shekhar, R
    Zagrodsky, V
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2002, 21 (01) : 9 - 22
  • [49] Two-step mutual information-based stereo matching
    Heo, Y. S.
    ELECTRONICS LETTERS, 2016, 52 (14) : 1225 - 1226
  • [50] MIRA: mutual information-based reporter algorithm for metabolic networks
    Cicek, A. Ercument
    Roeder, Kathryn
    Ozsoyoglu, Gultekin
    BIOINFORMATICS, 2014, 30 (12) : 175 - 184