Production Deployment of Machine-Learned Rotorcraft Surrogate Models on HPC

被引：3

作者：

Brewer, Wesley ^{[1
]}

Martinez, Daniel ^{[2
]}

Boyer, Mathew ^{[1
]}

Jude, Dylan ^{[3
]}

Wissink, Andy ^{[3
]}

Parsons, Ben ^{[4
]}

Yin, Junqi ^{[5
]}

Anantharaj, Valentine ^{[5
]}

机构：

[1] DoD HPCMP PET GDIT, Vicksburg, MS 39335 USA

[2] Sci & Technol Corp, Moffett Field, CA USA

[3] US Army DEVCOM AvMC DSE, Moffett Field, CA USA

[4] DoD HPCMP, Vicksburg, MS USA

[5] Oak Ridge Leadership Comp Facil, Oak Ridge, TN USA

来源：

PROCEEDINGS OF THE WORKSHOP ON MACHINE LEARNING IN HIGH PERFORMANCE COMPUTING ENVIRONMENTS (MLHPC 2021) | 2021年

关键词：

surrogate; inference; production; HPC;

D O I：

10.1109/MLHPC54614.2021.00008

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We explore how to optimally deploy several different types of machine-learned surrogate models used in rotorcraft aerodynamics on HPC. We first developed three different rotorcraft models at three different orders of magnitude (2M, 44M, and 212M trainable parameters) to use as test models. Then we developed a benchmark, which we call "smiBench", that uses synthetic data to test a wide range of alternative configurations to study optimal deployment scenarios. We discovered several different types of optimal deployment scenarios depending on the model size and inference frequency. For most cases, it makes sense to use multiple inference servers, each bound to a GPU with a load balancer distributing the requests across multiple GPUs. We tested three different types of inference server deployments: (1) a custom Flask-based HTTP inference server, (2) TensorFlow Serving with gRPC protocol, and (3) RedisAI server with SmartRedis clients using the RESP protocol. We also tested three different types of load balancing techniques for multiGPU inferencing: (1) Python concurrent.futures thread pool, (2) HAProxy, and (3) mpi4py. We investigated deployments on both DoD HPCMP's SCOUT and DoE OLCF's Summit POWER9 supercomputers, demonstrated the ability to inference a million samples per second using 192 GPUs, and studied multiple scenarios on both Nvidia T4 and V100 GPUs. Moreover, we studied a range of concurrency levels, both on the client-side and the server-side, and provide optimal configuration advice based on the type of deployment. Finally, we provide a simple Pythonbased framework for benchmarking machine-learned surrogate models using the various inference servers.

引用

页码：21 / 32

页数：12

共 50 条

[1] Problems in the deployment of machine-learned models in health care
Cohen, Joseph Paul
Cao, Tianshi
Viviano, Joseph D.
Huang, Chin-Wei
Fralick, Michael
Ghassemi, Marzyeh
Mamdani, Muhammad
Greiner, Russell
Bengio, Yoshua
[J]. CANADIAN MEDICAL ASSOCIATION JOURNAL, 2021, 193 (35) : E1391 - E1394
[2] Machine-learned multi-system surrogate models for materials prediction
Nyshadham, Chandramouli
Rupp, Matthias
Bekker, Brayden
Shapeev, Alexander, V
Mueller, Tim
Rosenbrock, Conrad W.
Csanyi, Gabor
Wingate, David W.
Hart, Gus L. W.
[J]. NPJ COMPUTATIONAL MATERIALS, 2019, 5 (1)
[3] Predictions of Steady and Unsteady Flows using Machine-learned Surrogate Models
Bhushan, Shanti
Burgreen, Greg W.
Bowman, Joshua L.
Dettwiller, Ian D.
Brewer, Wesley
[J]. 2020 IEEE/ACM WORKSHOP ON MACHINE LEARNING IN HIGH PERFORMANCE COMPUTING ENVIRONMENTS (MLHPC 2020) AND WORKSHOP ON ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR SCIENTIFIC APPLICATIONS (AI4S 2020), 2020, : 80 - 87
[4] Machine-learned multi-system surrogate models for materials prediction
Chandramouli Nyshadham
Matthias Rupp
Brayden Bekker
Alexander V. Shapeev
Tim Mueller
Conrad W. Rosenbrock
Gábor Csányi
David W. Wingate
Gus L. W. Hart
[J]. npj Computational Materials, 5
[5] Integration of machine-learned surrogate models in first principles inorganic material design
Janet, Jon Paul
Nandy, Aditya
Duan, Chenru
Kulik, Heather
[J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2018, 256
[6] Efficient Global Structure Optimization with a Machine-Learned Surrogate Model
Bisbo, Malthe K.
Hammer, Bjork
[J]. PHYSICAL REVIEW LETTERS, 2020, 124 (08)
[7] Machine-Learned Coarse-Grained Models
Bejagam, Karteek K.
Singh, Samrendra
An, Yaxin
Deshmukh, Sanket A.
[J]. JOURNAL OF PHYSICAL CHEMISTRY LETTERS, 2018, 9 (16): : 4667 - 4672
[8] Bayesian uncertainty quantification for machine-learned models in physics
Gal, Yarin
Koumoutsakos, Petros
Lanusse, Francois
Louppe, Gilles
Papadimitriou, Costas
[J]. NATURE REVIEWS PHYSICS, 2022, 4 (09) : 573 - 577
[9] Bayesian uncertainty quantification for machine-learned models in physics
Yarin Gal
Petros Koumoutsakos
Francois Lanusse
Gilles Louppe
Costas Papadimitriou
[J]. Nature Reviews Physics, 2022, 4 : 573 - 577
[10] Machine-learned Behaviour Models for a Distributed Behaviour Repository
Jahl, Alexander
Baraki, Harun
Jakob, Stefan
Fax, Malte
Geihs, Kurt
[J]. ICAART: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 1, 2022, : 188 - 199

← 1 2 3 4 5 →