A Mutual Information-Based Assessment of Reverse Engineering on Rewards of Reinforcement Learning

被引:0
|
作者
Chen T. [1 ]
Liu J. [1 ]
Baker T. [2 ]
Wu Y. [1 ]
Xiang Y. [1 ]
Li Y. [1 ]
Niu W. [1 ]
Tong E. [1 ]
Zomaya A.Y. [3 ]
机构
[1] Beijing Jiaotong University, Beijing Key Laboratory of Security and Privacy in Intelligent Transportation, Beijing
[2] College of Computing and Informatics, University of Sharjah, Department of Computer Science, Sharjah
[3] The University of Sydney, School of Computer Science, Sydney
来源
基金
中国国家自然科学基金;
关键词
Assessment; mutual information; reinforcement learning (RL); reverse engineering; tensor model;
D O I
10.1109/TAI.2022.3190811
中图分类号
学科分类号
摘要
Rewards are critical hyperparameters in reinforcement learning (RL), since in most cases different reward values will lead to greatly different performance. Due to their commercial value, RL rewards become the target of reverse engineering by the inverse reinforcement learning (IRL) algorithm family. Existing efforts typically utilize two metrics to measure the IRL performance: the expected value difference (EVD) and the mean reward loss (MRL). Unfortunately, in some cases, EVD and MRL can give completely opposite results, due to MRL focusing on whole state-space rewards, while EVD only considering partly sampled rewards. Such situation naturally rises to one fundamental question: whether current metrics and assessment are sufficient and accurate for more general use. Thus, in this article, based on the metric called normalized mutual information of reward clusters (C-NMI), we propose a novel IRL assessment; we aim to fill this research gap by considering a middle-granularity state space between the entire state space and the specific sampling space. We utilize the agglomerative nesting algorithm (AGNES) to control dynamical C-NMI computing via a fourth-order tensor model with injected manipulated trajectories. With such a model, we can uniformly capture different-dimension values of MRL, EVD, and C-NMI, and perform more comprehensive and accurate assessment and analyses. Extensive experiments on several mainstream IRLs are experimented in object world, hence revealing that the assessing accuracy of our method increases 110.13% and 116.59%, respectively, when compared with the EVD and MRL. Meanwhile, C-NMI is more robust than EVD and MRL under different demonstrations. © 2020 IEEE.
引用
收藏
页码:1089 / 1100
页数:11
相关论文
共 50 条
  • [31] Stopping rules for mutual information-based feature selection
    Mielniczuk, Jan
    Teisseyre, Pawel
    NEUROCOMPUTING, 2019, 358 : 255 - 274
  • [32] Integrated Sensing and Communications: A Mutual Information-Based Framework
    Ouyang, Chongjun
    Liu, Yuanwei
    Yang, Hongwen
    Al-Dhahir, Naofal
    IEEE COMMUNICATIONS MAGAZINE, 2023, 61 (05) : 26 - 32
  • [33] FAST MUTUAL INFORMATION-BASED MAP MODEL MATCHING
    Minvielle, Pierre
    2017 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2017, : 5149 - 5152
  • [34] Mutual information-based feature selection for multilabel classification
    Doquire, Gauthier
    Verleysen, Michel
    NEUROCOMPUTING, 2013, 122 : 148 - 155
  • [35] Mutual Information-based Exploration on Continuous Occupancy Maps
    Jadidi, Maani Ghaffari
    Miro, Jaime Valls
    Dissanayake, Gamini
    2015 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2015, : 6086 - 6092
  • [36] Mutual information-based recommender system using autoencoder
    Noshad, Zahra
    Bouyer, Asgarali
    Noshad, Mohammad
    APPLIED SOFT COMPUTING, 2021, 109
  • [37] Mutual Information-Based Texture Spectral Similarity Criterion
    Haindl, Michal
    Havlicek, Michal
    ADVANCES IN VISUAL COMPUTING, ISVC 2019, PT I, 2020, 11844 : 302 - 314
  • [38] A Study on Mutual Information-Based Feature Selection in Classifiers
    Arundhathi, B.
    Athira, A.
    Rajan, Ranjidha
    ARTIFICIAL INTELLIGENCE AND EVOLUTIONARY COMPUTATIONS IN ENGINEERING SYSTEMS, ICAIECES 2016, 2017, 517 : 479 - 486
  • [39] Entropy Selective Mutual Information-Based Image Registration
    Karvir, Hrishikesh V.
    Skipper, Julie A.
    Repperger, Daniel W.
    NAECON 2008 - IEEE NATIONAL AEROSPACE AND ELECTRONICS CONFERENCE, 2008, : 173 - +
  • [40] DESIGN AND IMPLEMENTATION OF A GENE NETWORK REVERSE ENGINEERING METHOD BASED ON MUTUAL INFORMATION
    Madni, Azad
    Andrecut, Mircea
    JOURNAL OF INTEGRATED DESIGN & PROCESS SCIENCE, 2007, 11 (03) : 55 - 68