Multicore processors and GPUs: the power of parallel computing in the Cloud

被引:0
|
作者
Bennett, Kelly W. [1 ]
Robertson, James [2 ]
机构
[1] US Army Res Lab, Sensors & Electron Devices Directorate, Adelphi, MD 20783 USA
[2] Clearhaven Technol LLC, Severna Pk, MD 21146 USA
关键词
Amazon Web Services; Cloud Computing; AL/ML algorithms; Cloud-based GPU platform; Big data; Disruptive Computing Technologies; AWS SageMaker;
D O I
10.1117/12.2558600
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sensors used in intelligence, surveillance and reconnaissance (ISR) operations and activities have the ability to generate vast amounts of data. High-volume analytical capabilities are needed to process data from multi-modal sensors to develop and test complex computational and deep learning models in support of the U.S. Army Multi-Domain Operations (MDO). The Army Research Laboratory designs, develops and tests Artificial Intelligence and Machine Learning (AI/ML) algorithms employing large repositories of in-house data. To efficiently process the data as well as design, build, train and deploy models, parallel and distributed algorithms are needed. Deep learning frameworks provide language-specific, container-based building blocks associated with deep learning neural networks applied to specific target applications. This paper discusses applications of AI/ML deep learning frameworks and Software Development Kits (SDKs) and demonstrates and compares specific multi-core processor and NVidia Graphics Processing Unit (GPU) implementations for desktop and Cloud environments. Frameworks selected for this research include PyTorch and Matlab. Amazon Web Services (AWS) SageMaker was used to launch Machine Learning instances ranging from general purpose computing to GPU instances. Detailed processes, example code, performance enhancements, best practices and lessons learned are included for publicly available acoustic and image datasets. Research results indicate parallel implementations of data preprocessing steps saved significant time but more expensive GPUs did not provide any processing time advantages for the machine learning algorithms tested.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Optimizing the Exploitation of Multicore Processors and GPUs with OpenMP and OpenCL
    Ferrer, Roger
    Planas, Judit
    Bellens, Pieter
    Duran, Alejandro
    Gonzalez, Marc
    Martorell, Xavier
    Badia, Rosa M.
    Ayguade, Eduard
    Labarta, Jesus
    [J]. LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, 2011, 6548 : 215 - +
  • [2] Parallel Computing with GPUs
    Elster, Anne C.
    Requena, Stephane
    [J]. PARALLEL COMPUTING: FROM MULTICORES AND GPU'S TO PETASCALE, 2010, 19 : 533 - 535
  • [3] Concurrent FFT computing on multicore processors
    Barhen, J.
    Humble, T.
    Mitra, P.
    Imam, N.
    Schleck, B.
    Kotas, C.
    Traweek, M.
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2012, 24 (01): : 29 - 44
  • [4] Parallel evidence propagation on multicore processors
    Yinglong Xia
    Viktor K. Prasanna
    [J]. The Journal of Supercomputing, 2011, 57 : 189 - 202
  • [5] Parallel Evidence Propagation on Multicore Processors
    Xia, Yinglong
    Feng, Xiaojun
    Prasanna, Viktor K.
    [J]. PARALLEL COMPUTING TECHNOLOGIES, PROCEEDINGS, 2009, 5698 : 377 - +
  • [6] Parallel evidence propagation on multicore processors
    Xia, Yinglong
    Prasanna, Viktor K.
    [J]. JOURNAL OF SUPERCOMPUTING, 2011, 57 (02): : 189 - 202
  • [7] ParTejas: A Parallel Simulator for Multicore Processors
    Malhotra, Geetika
    Kalayappan, Rajshekar
    Goel, Seep
    Aggarwal, Pooja
    Sagar, Abhishek
    Sarangi, Smruti R.
    [J]. ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION, 2017, 27 (03):
  • [8] A Performance and Energy Comparison of Convolution on GPUs, FPGAs, and Multicore Processors
    Fowers, Jeremy
    Brown, Greg
    Wernsing, John
    Stitt, Greg
    [J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2013, 9 (04)
  • [9] ParTejas: A Parallel Simulator for Multicore Processors
    Malhotra, Geetika
    Aggarwal, Pooja
    Sagar, Abhishek
    Sarangi, Smruti R.
    [J]. 2014 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS), 2014, : 130 - 131
  • [10] High Performance Parallelization of COMPSYN on a Cluster of Multicore Processors with GPUs
    Alessi, Ferdinando
    Massini, Annalisa
    Basili, Roberto
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2012, 2012, 9 : 966 - 975