Learning state-action correspondence across reinforcement learning control tasks via partially paired trajectories

被引：0

作者：

Garcia, Javier ^{[1
]}

Rano, Inaki ^{[1
]}

Bures, J. Miguel ^{[2
]}

Fdez-Vidal, Xose R. ^{[2
]}

Iglesias, Roberto ^{[2
]}

机构：

[1] Univ Santiago De Compostela, Dept Elect & Comp Sci, Lugo, Spain

[2] Univ Santiago de Compostela, CiTIUS Ctr Invest Tecnoloxias Intelixentes, Santiago De Compostela, Spain

来源：

APPLIED INTELLIGENCE | 2025年 / 55卷 / 03期

关键词：

Reinforcement learning; Transfer learning; Inter-task mapping; NETWORKS;

D O I：

10.1007/s10489-024-06190-7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In many reinforcement learning (RL) tasks, the state-action space may be subject to changes over time (e.g., increased number of observable features, changes of representation of actions). Given these changes, the previously learnt policy will likely fail due to the mismatch of input and output features, and another policy must be trained from scratch, which is inefficient in terms of sample complexity. Recent works in transfer learning have succeeded in making RL algorithms more efficient by incorporating knowledge from previous tasks, thus partially alleviating this problem. However, such methods typically must provide an explicit state-action correspondence of one task into the other. An autonomous agent may not have access to such high-level information, but should be able to analyze its experience to identify similarities between tasks. In this paper, we propose a novel method for automatically learning a correspondence of states and actions from one task to another through an agent's experience. In contrast to previous approaches, our method is based on two key insights: i) only the first state of the trajectories of the two tasks is paired, while the rest are unpaired and randomly collected, and ii) the transition model of the source task is used to predict the dynamics of the target task, thus aligning the unpaired states and actions. Additionally, this paper intentionally decouples the learning of the state-action corresponce from the transfer technique used, making it easy to combine with any transfer method. Our experiments demonstrate that our approach significantly accelerates transfer learning across a diverse set of problems, varying in state/action representation, physics parameters, and morphology, when compared to state-of-the-art algorithms that rely on cycle-consistency.

引用

页数：18

共 50 条

[1] For SALE: State-Action Representation Learning for Deep Reinforcement Learning
Fujimoto, Scott
Chang, Wei-Di
Smith, Edward J.
Gu, Shixiang Shane
Precup, Doina
Meger, David
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[2] Cross-Domain Adaptive Transfer Reinforcement Learning Based on State-Action Correspondence
You, Heng
Yang, Tianpei
Zheng, Yan
Hao, Jianye
Taylor, Matthew E.
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 180, 2022, 180 : 2299 - 2309
[3] A Plume-Tracing Strategy via Continuous State-action Reinforcement Learning
Niu, Lvyin
Song, Shiji
You, Keyou
2017 CHINESE AUTOMATION CONGRESS (CAC), 2017, : 759 - 764
[4] Enhancing visual reinforcement learning with State-Action Representation
Yan, Mengbei
Lyu, Jiafei
Li, Xiu
KNOWLEDGE-BASED SYSTEMS, 2024, 304
[5] PROJECTED STATE-ACTION BALANCING WEIGHTS FOR OFFLINE REINFORCEMENT LEARNING
Wang, Jiayi
Qi, Zhengling
Wong, Raymond K. W.
ANNALS OF STATISTICS, 2023, 51 (04): : 1639 - 1665
[6] A REINFORCEMENT LEARNING MODEL USING DETERMINISTIC STATE-ACTION SEQUENCES
Murata, Makoto
Ozawa, Seiichi
INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2010, 6 (02): : 577 - 590
[7] Beyond Cumulative Returns via Reinforcement Learning over State-Action Occupancy Measures
Zhang, Junyu
Bedi, Amrit Singh
Wang, Mengdi
Koppel, Alec
2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 894 - 901
[8] STATE-ACTION VALUE FUNCTION MODELED BY ELM IN REINFORCEMENT LEARNING FOR HOSE CONTROL PROBLEMS
Manuel Lopez-Guede, Jose
Fernandez-Gauna, Borja
Grana, Manuel
INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2013, 21 : 99 - 116
[9] Efficient Reinforcement Learning Using State-Action Uncertainty with Multiple Heads
Aizu, Tomoharu
Oba, Takeru
Ukita, Norimichi
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VIII, 2023, 14261 : 184 - 196
[10] Speeding up Tabular Reinforcement Learning Using State-Action Similarities
Rosenfeld, Ariel
Taylor, Matthew E.
Kraus, Sarit
AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 1722 - 1724

← 1 2 3 4 5 →