Learning state-action correspondence across reinforcement learning control tasks via partially paired trajectories

被引:0
|
作者
Garcia, Javier [1 ]
Rano, Inaki [1 ]
Bures, J. Miguel [2 ]
Fdez-Vidal, Xose R. [2 ]
Iglesias, Roberto [2 ]
机构
[1] Univ Santiago De Compostela, Dept Elect & Comp Sci, Lugo, Spain
[2] Univ Santiago de Compostela, CiTIUS Ctr Invest Tecnoloxias Intelixentes, Santiago De Compostela, Spain
关键词
Reinforcement learning; Transfer learning; Inter-task mapping; NETWORKS;
D O I
10.1007/s10489-024-06190-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In many reinforcement learning (RL) tasks, the state-action space may be subject to changes over time (e.g., increased number of observable features, changes of representation of actions). Given these changes, the previously learnt policy will likely fail due to the mismatch of input and output features, and another policy must be trained from scratch, which is inefficient in terms of sample complexity. Recent works in transfer learning have succeeded in making RL algorithms more efficient by incorporating knowledge from previous tasks, thus partially alleviating this problem. However, such methods typically must provide an explicit state-action correspondence of one task into the other. An autonomous agent may not have access to such high-level information, but should be able to analyze its experience to identify similarities between tasks. In this paper, we propose a novel method for automatically learning a correspondence of states and actions from one task to another through an agent's experience. In contrast to previous approaches, our method is based on two key insights: i) only the first state of the trajectories of the two tasks is paired, while the rest are unpaired and randomly collected, and ii) the transition model of the source task is used to predict the dynamics of the target task, thus aligning the unpaired states and actions. Additionally, this paper intentionally decouples the learning of the state-action corresponce from the transfer technique used, making it easy to combine with any transfer method. Our experiments demonstrate that our approach significantly accelerates transfer learning across a diverse set of problems, varying in state/action representation, physics parameters, and morphology, when compared to state-of-the-art algorithms that rely on cycle-consistency.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] Competitive reinforcement learning in continuous control tasks
    Abramson, M
    Pachowicz, P
    Wechsler, H
    PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS 2003, VOLS 1-4, 2003, : 1909 - 1914
  • [42] Reinforcement learning in dynamic environment -Abstraction of state-action space utilizing properties of the robot body and environment-
    Takeuchi, Yutaka
    Ito, Kazuyuki
    PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL SYMPOSIUM ON ARTIFICIAL LIFE AND ROBOTICS (AROB 17TH '12), 2012, : 938 - 942
  • [43] How Should Learning Classifier Systems Cover A State-Action Space?
    Nakata, Masaya
    Lanzi, Pier Luca
    Kovacs, Tim
    Browne, Will Neil
    Takadama, Keiki
    2015 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2015, : 3012 - 3019
  • [44] Autonomous control of real snake-like robot using reinforcement learning - Abstraction of state-action space using properties of real world
    Ito, Kazuyuki
    Fukumori, Yoshitaka
    Takayama, Akihiro
    PROCEEDINGS OF THE 2007 INTERNATIONAL CONFERENCE ON INTELLIGENT SENSORS, SENSOR NETWORKS AND INFORMATION PROCESSING, 2007, : 389 - +
  • [45] Learning Action Translator for Meta Reinforcement Learning on Sparse-Reward Tasks
    Guo, Yijie
    Wu, Qiucheng
    Lee, Honglak
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6792 - 6800
  • [46] PID-Inspired Inductive Biases for Deep Reinforcement Learning in Partially Observable Control Tasks
    Char, Ian
    Schneider, Jeff
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [47] Improving Reinforcement Learning control via online bilinear action interpolation
    Ribeiro, CHC
    Hemerly, EM
    VTH BRAZILIAN SYMPOSIUM ON NEURAL NETWORKS, PROCEEDINGS, 1998, : 102 - 105
  • [48] Learning State-Specific Action Masks for Reinforcement Learning
    Wang, Ziyi
    Li, Xinran
    Sun, Luoyang
    Zhang, Haifeng
    Liu, Hualin
    Wang, Jun
    ALGORITHMS, 2024, 17 (02)
  • [49] Safety reinforcement learning control via transfer learning
    Zhang, Quanqi
    Wu, Chengwei
    Tian, Haoyu
    Gao, Yabin
    Yao, Weiran
    Wu, Ligang
    AUTOMATICA, 2024, 166
  • [50] Learning to Control Camera Exposure via Reinforcement Learning
    Lee, Kyunghyun
    Shin, Ukcheol
    Lee, Byeong-Uk
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 2975 - 2983