AIPerf: Automated Machine Learning as an AI-HPC Benchmark

被引:10
|
作者
Ren, Zhixiang [1 ]
Liu, Yongheng [1 ]
Shi, Tianhui [2 ]
Xie, Lei [2 ]
Zhou, Yue [1 ]
Zhai, Jidong [2 ]
Zhang, Youhui [2 ]
Zhang, Yunquan [3 ]
Chen, Wenguang [2 ]
机构
[1] Peng Cheng Natl Lab, Shenzhen 518000, Peoples R China
[2] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[3] Chinese Acad Sci, Inst Comp Technol, Beijing 100086, Peoples R China
关键词
High-Performance Computing (HPC); Artificial Intelligence (AI); automated machine learning; SYSTEMS;
D O I
10.26599/BDMA.2021.9020004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The plethora of complex Artificial Intelligence (AI) algorithms and available High-Performance Computing (HPC) power stimulates the expeditious development of AI components with heterogeneous designs. Consequently, the need for cross-stack performance benchmarking of AI-HPC systems has rapidly emerged. In particular, the de facto HPC benchmark, LINPACK, cannot reflect the AI computing power and input/output performance without a representative workload. Current popular AI benchmarks, such as MLPerf, have a fixed problem size and therefore limited scalability. To address these issues, we propose an end-to-end benchmark suite utilizing automated machine learning, which not only represents real AI scenarios, but also is auto-adaptively scalable to various scales of machines. We implement the algorithms in a highly parallel and flexible way to ensure the efficiency and optimization potential on diverse systems with customizable configurations. We utilize Operations Per Second (OPS), which is measured in an analytical and systematic approach, as a major metric to quantify the AI performance. We perform evaluations on various systems to ensure the benchmark's stability and scalability, from 4 nodes with 32 NVIDIA Tesla T4 (56.1 Tera-OPS measured) up to 512 nodes with 4096 Huawei Ascend 910 (194.53 Peta-OPS measured), and the results show near-linear weak scalability. With a flexible workload and single metric, AIPerf can easily scale on and rank AI-HPC, providing a powerful benchmark suite for the coming supercomputing era.
引用
收藏
页码:208 / 220
页数:13
相关论文
共 50 条
  • [31] AI SCOPE - OPEN SOURCE AUTOMATED MICROSCOPY USING MACHINE LEARNING FOR MALARIA DIAGNOSIS
    Peire Paredes, Eduardo
    Moro, Laura
    Perez Tanoira, Ramon
    Cieslik, Jakub
    Wagemans, Wiebe
    Singh, Raminderpal
    Salazar Sanchez, Ernesto
    Pons, Maria
    Quispe, Antonio
    AMERICAN JOURNAL OF TROPICAL MEDICINE AND HYGIENE, 2018, 99 (04): : 11 - 11
  • [32] AI and Machine Learning for RT
    Jiang, Steve
    Xing, L.
    El Naqa, I.
    Li, H.
    MEDICAL PHYSICS, 2019, 46 (06) : E497 - E497
  • [33] On AI, Markets and Machine Learning
    Parkes, David C.
    AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 2 - 2
  • [34] Machine Learning and AI in the Sciences
    Stühmer, Jan
    KI - Kunstliche Intelligenz, 2024, 38 (03): : 113 - 114
  • [35] Simplifying AI and machine learning
    Siegel, Eliot
    APPLIED RADIOLOGY, 2018, 47 (05) : 26 - 28
  • [36] Prediction of HPC compressive strength based on machine learning
    Jin, Libing
    Duan, Jie
    Jin, Yichen
    Xue, Pengfei
    Zhou, Pin
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [37] Optimizing Machine Learning on Apache Spark in HPC Environments
    Li, Zhenyu
    Davis, James
    Jarvis, Stephen A.
    PROCEEDINGS OF 2018 IEEE/ACM MACHINE LEARNING IN HPC ENVIRONMENTS (MLHPC 2018), 2018, : 95 - 105
  • [38] An Overview of Machine Learning and HPC in Open Sources for Bioinformatics
    Tsai, Yin-Te
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 1338 - 1342
  • [39] Ergonomics in AI: Designing and Interacting With Machine Learning and AI
    Lau, Nathan
    Hildebrandt, Michael
    Jeon, Myounghoon
    ERGONOMICS IN DESIGN, 2020, 28 (03) : 3 - 3
  • [40] Benchmark of Automated Machine Learning with State-of-the-Art Image Segmentation Algorithms for Tool Condition Monitoring
    Lutz, B.
    Reisch, R.
    Kisskalt, D.
    Avci, B.
    Regulin, D.
    Knoll, A.
    Franke, J.
    30TH INTERNATIONAL CONFERENCE ON FLEXIBLE AUTOMATION AND INTELLIGENT MANUFACTURING (FAIM2021), 2020, 51 : 215 - 221