Reliability assessment of off-policy deep reinforcement learning: A benchmark for aerodynamics

被引:1
|
作者
Berger, Sandrine [1 ]
Ramo, Andrea Arroyo [1 ]
Guillet, Valentin [2 ]
Lahire, Thibault [2 ]
Martin, Brice [2 ]
Jardin, Thierry [1 ]
Rachelson, Emmanuel [2 ]
机构
[1] Univ Toulouse, Dept Aerodynam & Prop, ISAE SUPAERO, Toulouse, France
[2] Univ Toulouse, Dept Complex Syst Engn, ISAE SUPAERO, Toulouse, France
来源
关键词
Benchmark for aerodynamics; computational fluid dynamics; deep reinforcement learning; off-policy algorithms; reliability; NEURAL-NETWORKS; FLOWS;
D O I
10.1017/dce.2023.28
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep reinforcement learning (DRL) is promising for solving control problems in fluid mechanics, but it is a new field with many open questions. Possibilities are numerous and guidelines are rare concerning the choice of algorithms or best formulations for a given problem. Besides, DRL algorithms learn a control policy by collecting samples from an environment, which may be very costly when used with Computational Fluid Dynamics (CFD) solvers. Algorithms must therefore minimize the number of samples required for learning (sample efficiency) and generate a usable policy from each training (reliability). This paper aims to (a) evaluate three existing algorithms (DDPG, TD3, and SAC) on a fluid mechanics problem with respect to reliability and sample efficiency across a range of training configurations, (b) establish a fluid mechanics benchmark of increasing data collection cost, and (c) provide practical guidelines and insights for the fluid dynamics practitioner. The benchmark consists in controlling an airfoil to reach a target. The problem is solved with either a low-cost low -order model or with a high-fidelity CFD approach. The study found that DDPG and TD3 have learning stability issues highly dependent on DRL hyperparameters and reward formulation, requiring therefore significant tuning. In contrast, SAC is shown to be both reliable and sample efficient across a wide range of parameter setups, making it well suited to solve fluid mechanics problems and set up new cases without tremendous effort. In particular, SAC is resistant to small replay buffers, which could be critical if full -flow fields were to be stored.
引用
收藏
页数:32
相关论文
共 50 条
  • [1] Off-Policy Deep Reinforcement Learning without Exploration
    Fujimoto, Scott
    Meger, David
    Precup, Doina
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [2] Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift
    Gelada, Carles
    Bellemare, Marc G.
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 3647 - 3655
  • [3] Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning
    Shi, Wenjie
    Song, Shiji
    Wu, Hui
    Hsu, Ya-Chu
    Wu, Cheng
    Huang, Gao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [4] Stabilizing Off-Policy Deep Reinforcement Learning from Pixels
    Cetin, Edoardo
    Ball, Philip J.
    Roberts, Steve
    Celiktutan, Oya
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [5] Trajectory-Based Off-Policy Deep Reinforcement Learning
    Doerr, Andreas
    Volpp, Michael
    Toussaint, Marc
    Trimpe, Sebastian
    Daniel, Christian
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [6] Research on Experience Replay of Off-policy Deep Reinforcement Learning: A Review
    Hu Z.-J.
    Gao X.-G.
    Wan K.-F.
    Zhang L.-T.
    Wang Q.-L.
    Neretin E.
    Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (11): : 2237 - 2256
  • [7] Off-policy asymptotic and adaptive maximum entropy deep reinforcement learning
    Huihui Zhang
    Xu Han
    International Journal of Machine Learning and Cybernetics, 2025, 16 (4) : 2417 - 2429
  • [8] Exploration with Multiple Random ε-Buffers in Off-Policy Deep Reinforcement Learning
    Kim, Chayoung
    Park, JiSu
    SYMMETRY-BASEL, 2019, 11 (11):
  • [9] Off-Policy Deep Reinforcement Learning Based on Steffensen Value Iteration
    Cheng, Yuhu
    Chen, Lin
    Chen, C. L. Philip
    Wang, Xuesong
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2021, 13 (04) : 1023 - 1032
  • [10] Safe and efficient off-policy reinforcement learning
    Munos, Remi
    Stepleton, Thomas
    Harutyunyan, Anna
    Bellemare, Marc G.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29