Reliability assessment of off-policy deep reinforcement learning: A benchmark for aerodynamics

被引：1

作者：

Berger, Sandrine ^{[1
]}

Ramo, Andrea Arroyo ^{[1
]}

Guillet, Valentin ^{[2
]}

Lahire, Thibault ^{[2
]}

Martin, Brice ^{[2
]}

Jardin, Thierry ^{[1
]}

Rachelson, Emmanuel ^{[2
]}

机构：

[1] Univ Toulouse, Dept Aerodynam & Prop, ISAE SUPAERO, Toulouse, France

[2] Univ Toulouse, Dept Complex Syst Engn, ISAE SUPAERO, Toulouse, France

来源：

DATA-CENTRIC ENGINEERING | 2024年 / 5卷

关键词：

Benchmark for aerodynamics; computational fluid dynamics; deep reinforcement learning; off-policy algorithms; reliability; NEURAL-NETWORKS; FLOWS;

D O I：

10.1017/dce.2023.28

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep reinforcement learning (DRL) is promising for solving control problems in fluid mechanics, but it is a new field with many open questions. Possibilities are numerous and guidelines are rare concerning the choice of algorithms or best formulations for a given problem. Besides, DRL algorithms learn a control policy by collecting samples from an environment, which may be very costly when used with Computational Fluid Dynamics (CFD) solvers. Algorithms must therefore minimize the number of samples required for learning (sample efficiency) and generate a usable policy from each training (reliability). This paper aims to (a) evaluate three existing algorithms (DDPG, TD3, and SAC) on a fluid mechanics problem with respect to reliability and sample efficiency across a range of training configurations, (b) establish a fluid mechanics benchmark of increasing data collection cost, and (c) provide practical guidelines and insights for the fluid dynamics practitioner. The benchmark consists in controlling an airfoil to reach a target. The problem is solved with either a low-cost low -order model or with a high-fidelity CFD approach. The study found that DDPG and TD3 have learning stability issues highly dependent on DRL hyperparameters and reward formulation, requiring therefore significant tuning. In contrast, SAC is shown to be both reliable and sample efficient across a wide range of parameter setups, making it well suited to solve fluid mechanics problems and set up new cases without tremendous effort. In particular, SAC is resistant to small replay buffers, which could be critical if full -flow fields were to be stored.

引用

页数：32

共 50 条

[1] Off-Policy Deep Reinforcement Learning without Exploration
Fujimoto, Scott
Meger, David
Precup, Doina
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[2] Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift
Gelada, Carles
Bellemare, Marc G.
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 3647 - 3655
[3] Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning
Shi, Wenjie
Song, Shiji
Wu, Hui
Hsu, Ya-Chu
Wu, Cheng
Huang, Gao
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[4] Stabilizing Off-Policy Deep Reinforcement Learning from Pixels
Cetin, Edoardo
Ball, Philip J.
Roberts, Steve
Celiktutan, Oya
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[5] Trajectory-Based Off-Policy Deep Reinforcement Learning
Doerr, Andreas
Volpp, Michael
Toussaint, Marc
Trimpe, Sebastian
Daniel, Christian
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[6] Research on Experience Replay of Off-policy Deep Reinforcement Learning: A Review
Hu Z.-J.
Gao X.-G.
Wan K.-F.
Zhang L.-T.
Wang Q.-L.
Neretin E.
Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (11): : 2237 - 2256
[7] Off-policy asymptotic and adaptive maximum entropy deep reinforcement learning
Huihui Zhang
Xu Han
International Journal of Machine Learning and Cybernetics, 2025, 16 (4) : 2417 - 2429
[8] Exploration with Multiple Random ε-Buffers in Off-Policy Deep Reinforcement Learning
Kim, Chayoung
Park, JiSu
SYMMETRY-BASEL, 2019, 11 (11):
[9] Off-Policy Deep Reinforcement Learning Based on Steffensen Value Iteration
Cheng, Yuhu
Chen, Lin
Chen, C. L. Philip
Wang, Xuesong
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2021, 13 (04) : 1023 - 1032
[10] Safe and efficient off-policy reinforcement learning
Munos, Remi
Stepleton, Thomas
Harutyunyan, Anna
Bellemare, Marc G.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29

← 1 2 3 4 5 →