Autonomous Landing of the Quadrotor on the Mobile Platform via Meta Reinforcement Learning

被引:0
|
作者
Cao, Qianqian [1 ,2 ]
Liu, Ziyi [1 ,2 ]
Yu, Hai [1 ,2 ]
Liang, Xiao [1 ,2 ]
Fang, Yongchun [1 ,2 ]
机构
[1] Nankai Univ, Coll Artificial Intelligence, Inst Robot & Automat Informat Syst, Tianjin 300350, Peoples R China
[2] Nankai Univ, Tianjin Key Lab Intelligent Robot, Tianjin 300350, Peoples R China
基金
中国国家自然科学基金;
关键词
Quadrotor; meta reinforcement learning; autonomous landing; trajectory planning and control;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Landing a quadrotor on a mobile platform moving with various unknown trajectories presents special challenges, including the requirements of fast trajectory planning/replanning, accurate control, and the adaptability for different target trajectories, especially when the platform is non-cooperative. However, previous works either assume the platform moves along a predefined trajectory or decouple planning from control which may cause a delay in tracking. In this work, we integrate planning and control into a unified framework and present an efficient off-policy Meta-Reinforcement Learning (Meta-RL) algorithm that enables a quadrotor (agent) to land on a mobile platform with various unknown trajectories autonomously. In our approach, we disentangle task-specific policy parameters by a separate adapter network to shared low-level parameters and learn a probabilistic encoder to extract common structures across different tasks. Specifically, during meta-training, we sample different trajectories from the task distribution, and then the probabilistic encoder accumulates the necessary statistics from past experience into the latent variables that enable the policy to perform the task. At meta-testing time, when the quadrotor is faced with an unseen trajectory, the latent variables can be sampled according to past interactions between the quadrotor and the mobile platform and held constant during an episode, enabling rapid trajectory-level adaptation. We assume similar tasks share a common low-dimensional structure in the representation of the policy network and the task-specific information is learned in the head of the policy. Accordingly, we further propose a separate adapter net as a supervised learning problem. The adapter net learns the weights of the policy's output layer for each meta-training task given by the environment interactions from the agent. When adapting to a new task during meta-testing, we fix the shared model layers and predict the head weights for the new task using the trained adapter network. This ensures that the pretrained policy can efficiently adapt to different tasks, which boosts the out-of-distribution performance. Our method can directly control the pitch, roll, yaw angle, and thrust of the quadrotor, yielding a fast response to the trajectory change. Simulation results show the superiority of our method both in success rate and adaptation efficiency over other RL algorithms on meta-testing tasks. The real-world experimental results compared with traditional planning and control algorithms demonstrate the satisfactory performance of our autonomous landing method, especially its robustness in adapting to unknown dynamics. Note to Practitioners-Given the challenge posed by the motion uncertainty when a quadrotor lands on a mobile platform with an unknown trajectory, there hasn't been a well-established solution, as far as we know. This paper introduces meta-reinforcement learning, incorporating a latent variable encoder to extract common features from training tasks, and designing an adapter network to enhance the ability of policy networks to adapt to new tasks, thereby enhancing the landing performance of the agent. The proposed method demonstrates promising results in both simulation and experiments.
引用
收藏
页码:2269 / 2280
页数:12
相关论文
共 50 条
  • [1] Autonomous Landing of the Quadrotor on the Mobile Platform via Meta Reinforcement Learning
    Cao, Qianqian
    Liu, Ziyi
    Yu, Hai
    Liang, Xiao
    Fang, Yongchun
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 : 2269 - 2280
  • [2] A Reinforcement Learning Approach for Autonomous Control and Landing of a Quadrotor
    Vankadari, Madhu Babu
    Das, Kaushik
    Shinde, Chinmay
    Kumar, Swagat
    2018 INTERNATIONAL CONFERENCE ON UNMANNED AIRCRAFT SYSTEMS (ICUAS), 2018, : 676 - 683
  • [3] Deep Reinforcement Learning with Corrective Feedback for Autonomous UAV Landing on a Mobile Platform
    Wu, Lizhen
    Wang, Chang
    Zhang, Pengpeng
    Wei, Changyun
    DRONES, 2022, 6 (09)
  • [4] Autonomous Landing of a Quadrotor on a Moving Platform
    Ghommam, Jawhar
    Saad, Maarouf
    IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2017, 53 (03) : 1504 - 1519
  • [5] Autonomous Landing of a Quadrotor on a Moving Platform via Model Predictive Control
    Guo, Kaiyang
    Tang, Pan
    Wang, Hui
    Lin, Defu
    Cui, Xiaoxi
    AEROSPACE, 2022, 9 (01)
  • [6] Monocular Vision based Autonomous Landing of Quadrotor through Deep Reinforcement Learning
    Xu, Yinbo
    Liu, Zhihong
    Wang, Xiangke
    2018 37TH CHINESE CONTROL CONFERENCE (CCC), 2018, : 10014 - 10019
  • [7] A Deep Reinforcement Learning Strategy for UAV Autonomous Landing on a Platform
    Jiang, Zhiling
    Song, Guanghua
    2022 INTERNATIONAL CONFERENCE ON COMPUTING, ROBOTICS AND SYSTEM SCIENCES, ICRSS, 2022, : 104 - 109
  • [8] Cooperative Landing on Mobile Platform for Multiple Unmanned Aerial Vehicles via Reinforcement Learning
    Xu, Yahao
    Li, Jingtai
    Wu, Bi
    Wu, Junqi
    Deng, Hongbin
    Hui, David
    JOURNAL OF AEROSPACE ENGINEERING, 2024, 37 (01)
  • [9] Autonomous Visual Tracking and Landing of a Quadrotor on a Moving Platform
    Ajmera, Juhi
    Siddharthan, P. R.
    Ramaravind, K. M.
    Vasan, Gautham
    Balaji, Naresh
    Sankaranarayanan, V.
    2015 THIRD INTERNATIONAL CONFERENCE ON IMAGE INFORMATION PROCESSING (ICIIP), 2015, : 342 - 347
  • [10] A Deep Reinforcement Learning Strategy for UAV Autonomous Landing on a Moving Platform
    Rodriguez-Ramos, Alejandro
    Sampedro, Carlos
    Bavle, Hriday
    de la Puente, Paloma
    Campoy, Pascual
    JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2019, 93 (1-2) : 351 - 366