Finding exploratory rewards by embodied evolution and constrained reinforcement learning in the Cyber Rodents

被引:0
|
作者
Uchibe, Eiji [1 ]
Doya, Kenji [1 ]
机构
[1] Okinawa Inst Sci & Technol, Okinawa 9042234, Japan
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The aim of the Cyber Rodent project [1] is to elucidate the origin of our reward and affective systems by building artificial agents that share the natural biological constraints: self-preservation (foraging) and self-reproduction (mating). This paper shows a method to evolve an agent's exploratory reward by combining a framework of embodied evolution and the algorithm of constrained policy gradient reinforcement learning. Biological constraints are modeled by the average criteria, and the exploratory reward is computed from its own sensor information. The agent in which a part of constraints are satisfied is allowed to mate with another agent. If a mating behavior is successfully made between two agents, one of genetic operations is applied according to fitness values to improve the exploratory rewards. Through learning and embodied evolution, a group of agents obtain appropriate exploratory rewards.
引用
收藏
页码:167 / 176
页数:10
相关论文
共 10 条
  • [1] Finding intrinsic rewards by embodied evolution and constrained reinforcement learning
    Uchibe, Eiji
    Doya, Kenji
    [J]. NEURAL NETWORKS, 2008, 21 (10) : 1447 - 1455
  • [2] Constrained reinforcement learning from intrinsic and extrinsic rewards
    Uchibe, Eiji
    Doya, Kenji
    [J]. 2007 IEEE 6TH INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING, 2007, : 45 - +
  • [3] State Augmented Constrained Reinforcement Learning: Overcoming the Limitations of Learning With Rewards
    Calvo-Fullana, Miguel
    Paternain, Santiago
    Chamon, Luiz F. O.
    Ribeiro, Alejandro
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (07) : 4275 - 4290
  • [4] Finding the Optimal Security Policies for Autonomous Cyber Operations With Competitive Reinforcement Learning
    McDonald, Garrett
    Li, Li
    Mallah, Ranwa Al
    [J]. IEEE ACCESS, 2024, 12 : 120292 - 120305
  • [5] Co-evolution of Shaping Rewards and Meta-Parameters in Reinforcement Learning
    Elfwing, Stefan
    Uchibe, Eiji
    Doya, Kenji
    Christensen, Henrik I.
    [J]. ADAPTIVE BEHAVIOR, 2008, 16 (06) : 400 - 412
  • [6] Reinforcement Learning Methods for Finding Equilibria and Tracking Evolution Paths in Conflicts
    Li, Donghua
    Jiang, Ju
    Xu, Haiyan
    Hipel, Keith W.
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), VOLS 1-6, 2008, : 3291 - +
  • [7] Deep reinforcement learning assisted co-evolutionary differential evolution for constrained optimization
    Hu, Zhenzhen
    Gong, Wenyin
    Pedrycz, Witold
    Li, Yanchi
    [J]. SWARM AND EVOLUTIONARY COMPUTATION, 2023, 83
  • [8] Reinforcement learning-based differential evolution algorithm for constrained multi-objective optimization problems
    Yu, Xiaobing
    Xu, Pingping
    Wang, Feng
    Wang, Xuming
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 131
  • [9] A constrained multi-item EOQ inventory model for reusable items: Reinforcement learning-based differential evolution and particle swarm optimization
    Fallahi, Ali
    Bani, Erfan Amani
    Niaki, Seyed Taghi Akhavan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 207
  • [10] A constrained multi-item EOQ inventory model for reusable items: Reinforcement learning-based differential evolution and particle swarm optimization
    Fallahi, Ali
    Amani Bani, Erfan
    Niaki, Seyed Taghi Akhavan
    [J]. Expert Systems with Applications, 2022, 207