Production Deployment of Machine-Learned Rotorcraft Surrogate Models on HPC

被引：3

作者：

Brewer, Wesley ^{[1
]}

Martinez, Daniel ^{[2
]}

Boyer, Mathew ^{[1
]}

Jude, Dylan ^{[3
]}

Wissink, Andy ^{[3
]}

Parsons, Ben ^{[4
]}

Yin, Junqi ^{[5
]}

Anantharaj, Valentine ^{[5
]}

机构：

[1] DoD HPCMP PET GDIT, Vicksburg, MS 39335 USA

[2] Sci & Technol Corp, Moffett Field, CA USA

[3] US Army DEVCOM AvMC DSE, Moffett Field, CA USA

[4] DoD HPCMP, Vicksburg, MS USA

[5] Oak Ridge Leadership Comp Facil, Oak Ridge, TN USA

来源：

PROCEEDINGS OF THE WORKSHOP ON MACHINE LEARNING IN HIGH PERFORMANCE COMPUTING ENVIRONMENTS (MLHPC 2021) | 2021年

关键词：

surrogate; inference; production; HPC;

D O I：

10.1109/MLHPC54614.2021.00008

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We explore how to optimally deploy several different types of machine-learned surrogate models used in rotorcraft aerodynamics on HPC. We first developed three different rotorcraft models at three different orders of magnitude (2M, 44M, and 212M trainable parameters) to use as test models. Then we developed a benchmark, which we call "smiBench", that uses synthetic data to test a wide range of alternative configurations to study optimal deployment scenarios. We discovered several different types of optimal deployment scenarios depending on the model size and inference frequency. For most cases, it makes sense to use multiple inference servers, each bound to a GPU with a load balancer distributing the requests across multiple GPUs. We tested three different types of inference server deployments: (1) a custom Flask-based HTTP inference server, (2) TensorFlow Serving with gRPC protocol, and (3) RedisAI server with SmartRedis clients using the RESP protocol. We also tested three different types of load balancing techniques for multiGPU inferencing: (1) Python concurrent.futures thread pool, (2) HAProxy, and (3) mpi4py. We investigated deployments on both DoD HPCMP's SCOUT and DoE OLCF's Summit POWER9 supercomputers, demonstrated the ability to inference a million samples per second using 192 GPUs, and studied multiple scenarios on both Nvidia T4 and V100 GPUs. Moreover, we studied a range of concurrency levels, both on the client-side and the server-side, and provide optimal configuration advice based on the type of deployment. Finally, we provide a simple Pythonbased framework for benchmarking machine-learned surrogate models using the various inference servers.

引用

页码：21 / 32

页数：12

共 50 条

[21] Modeling mesoscale energy localization in shocked HMX, part I: machine-learned surrogate models for the effects of loading and void sizes
Nassar, A.
Rai, N. K.
Sen, O.
Udaykumar, H. S.
[J]. SHOCK WAVES, 2019, 29 (04) : 537 - 558
[22] Effective hospital readmission prediction models using machine-learned features
Davis, Sacha
Zhang, Jin
Lee, Ilbin
Rezaei, Mostafa
Greiner, Russell
McAlister, Finlay A.
Padwal, Raj
[J]. BMC HEALTH SERVICES RESEARCH, 2022, 22 (01)
[23] Machine-Learned Premise Selection for Lean
Piotrowski, Bartosz
Mir, Ramon Fernandez
Ayers, Edward
[J]. AUTOMATED REASONING WITH ANALYTIC TABLEAUX AND RELATED METHODS, TABLEAUX 2023, 2023, 14278 : 175 - 186
[24] Understanding machine-learned density functionals
Li, Li
Snyder, John C.
Pelaschier, Isabelle M.
Huang, Jessica
Niranjan, Uma-Naresh
Duncan, Paul
Rupp, Matthias
Mueller, Klaus-Robert
Burke, Kieron
[J]. INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY, 2016, 116 (11) : 819 - 833
[25] Minimum standards for evaluating machine-learned models of high-dimensional data
Chen, Brian H.
[J]. FRONTIERS IN AGING, 2022, 3
[26] Towards Automatically Generating Security Analyses from Machine-Learned Library Models
Kober, Maria
Arzt, Steven
[J]. COMPUTER SECURITY - ESORICS 2021, PT II, 2021, 12973 : 752 - 758
[27] Insights from symmetry: Improving machine-learned models for grain boundary segregation
Borges, Y.
Huber, L.
Zapolsky, H.
Patte, R.
Demange, G.
[J]. COMPUTATIONAL MATERIALS SCIENCE, 2024, 232
[28] Machine-learned electron densities of nucleic acids
Lee, Alex J.
Rackers, Joshua A.
Bricker, William P.
[J]. BIOPHYSICAL JOURNAL, 2024, 123 (03) : 499A - 499A
[29] Machine-learned potentials for eucryptite: A systematic comparison
Hill, Jorg-Rudiger
Mannstadt, Wolfgang
[J]. JOURNAL OF MATERIALS RESEARCH, 2023, 38 (24) : 5188 - 5197
[30] Toward Requirements Specification for Machine-Learned Components
Rahimi, Mona
Guo, Jin L. C.
Kokaly, Sahar
Chechik, Marsha
[J]. 2019 IEEE 27TH INTERNATIONAL REQUIREMENTS ENGINEERING CONFERENCE WORKSHOPS (REW 2019), 2019, : 241 - 244

← 1 2 3 4 5 →