Production Deployment of Machine-Learned Rotorcraft Surrogate Models on HPC

被引:3
|
作者
Brewer, Wesley [1 ]
Martinez, Daniel [2 ]
Boyer, Mathew [1 ]
Jude, Dylan [3 ]
Wissink, Andy [3 ]
Parsons, Ben [4 ]
Yin, Junqi [5 ]
Anantharaj, Valentine [5 ]
机构
[1] DoD HPCMP PET GDIT, Vicksburg, MS 39335 USA
[2] Sci & Technol Corp, Moffett Field, CA USA
[3] US Army DEVCOM AvMC DSE, Moffett Field, CA USA
[4] DoD HPCMP, Vicksburg, MS USA
[5] Oak Ridge Leadership Comp Facil, Oak Ridge, TN USA
关键词
surrogate; inference; production; HPC;
D O I
10.1109/MLHPC54614.2021.00008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We explore how to optimally deploy several different types of machine-learned surrogate models used in rotorcraft aerodynamics on HPC. We first developed three different rotorcraft models at three different orders of magnitude (2M, 44M, and 212M trainable parameters) to use as test models. Then we developed a benchmark, which we call "smiBench", that uses synthetic data to test a wide range of alternative configurations to study optimal deployment scenarios. We discovered several different types of optimal deployment scenarios depending on the model size and inference frequency. For most cases, it makes sense to use multiple inference servers, each bound to a GPU with a load balancer distributing the requests across multiple GPUs. We tested three different types of inference server deployments: (1) a custom Flask-based HTTP inference server, (2) TensorFlow Serving with gRPC protocol, and (3) RedisAI server with SmartRedis clients using the RESP protocol. We also tested three different types of load balancing techniques for multiGPU inferencing: (1) Python concurrent.futures thread pool, (2) HAProxy, and (3) mpi4py. We investigated deployments on both DoD HPCMP's SCOUT and DoE OLCF's Summit POWER9 supercomputers, demonstrated the ability to inference a million samples per second using 192 GPUs, and studied multiple scenarios on both Nvidia T4 and V100 GPUs. Moreover, we studied a range of concurrency levels, both on the client-side and the server-side, and provide optimal configuration advice based on the type of deployment. Finally, we provide a simple Pythonbased framework for benchmarking machine-learned surrogate models using the various inference servers.
引用
收藏
页码:21 / 32
页数:12
相关论文
共 50 条
  • [1] Problems in the deployment of machine-learned models in health care
    Cohen, Joseph Paul
    Cao, Tianshi
    Viviano, Joseph D.
    Huang, Chin-Wei
    Fralick, Michael
    Ghassemi, Marzyeh
    Mamdani, Muhammad
    Greiner, Russell
    Bengio, Yoshua
    [J]. CANADIAN MEDICAL ASSOCIATION JOURNAL, 2021, 193 (35) : E1391 - E1394
  • [2] Machine-learned multi-system surrogate models for materials prediction
    Nyshadham, Chandramouli
    Rupp, Matthias
    Bekker, Brayden
    Shapeev, Alexander, V
    Mueller, Tim
    Rosenbrock, Conrad W.
    Csanyi, Gabor
    Wingate, David W.
    Hart, Gus L. W.
    [J]. NPJ COMPUTATIONAL MATERIALS, 2019, 5 (1)
  • [3] Predictions of Steady and Unsteady Flows using Machine-learned Surrogate Models
    Bhushan, Shanti
    Burgreen, Greg W.
    Bowman, Joshua L.
    Dettwiller, Ian D.
    Brewer, Wesley
    [J]. 2020 IEEE/ACM WORKSHOP ON MACHINE LEARNING IN HIGH PERFORMANCE COMPUTING ENVIRONMENTS (MLHPC 2020) AND WORKSHOP ON ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR SCIENTIFIC APPLICATIONS (AI4S 2020), 2020, : 80 - 87
  • [4] Machine-learned multi-system surrogate models for materials prediction
    Chandramouli Nyshadham
    Matthias Rupp
    Brayden Bekker
    Alexander V. Shapeev
    Tim Mueller
    Conrad W. Rosenbrock
    Gábor Csányi
    David W. Wingate
    Gus L. W. Hart
    [J]. npj Computational Materials, 5
  • [5] Integration of machine-learned surrogate models in first principles inorganic material design
    Janet, Jon Paul
    Nandy, Aditya
    Duan, Chenru
    Kulik, Heather
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2018, 256
  • [6] Efficient Global Structure Optimization with a Machine-Learned Surrogate Model
    Bisbo, Malthe K.
    Hammer, Bjork
    [J]. PHYSICAL REVIEW LETTERS, 2020, 124 (08)
  • [7] Machine-Learned Coarse-Grained Models
    Bejagam, Karteek K.
    Singh, Samrendra
    An, Yaxin
    Deshmukh, Sanket A.
    [J]. JOURNAL OF PHYSICAL CHEMISTRY LETTERS, 2018, 9 (16): : 4667 - 4672
  • [8] Bayesian uncertainty quantification for machine-learned models in physics
    Gal, Yarin
    Koumoutsakos, Petros
    Lanusse, Francois
    Louppe, Gilles
    Papadimitriou, Costas
    [J]. NATURE REVIEWS PHYSICS, 2022, 4 (09) : 573 - 577
  • [9] Bayesian uncertainty quantification for machine-learned models in physics
    Yarin Gal
    Petros Koumoutsakos
    Francois Lanusse
    Gilles Louppe
    Costas Papadimitriou
    [J]. Nature Reviews Physics, 2022, 4 : 573 - 577
  • [10] Machine-learned Behaviour Models for a Distributed Behaviour Repository
    Jahl, Alexander
    Baraki, Harun
    Jakob, Stefan
    Fax, Malte
    Geihs, Kurt
    [J]. ICAART: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 1, 2022, : 188 - 199