ParaML: A Polyvalent Multicore Accelerator for Machine Learning

被引:3
|
作者
Zhou, Shengyuan [1 ,2 ]
Guo, Qi [1 ,3 ]
Du, Zidong [1 ,3 ]
Liu, Daofu [1 ,3 ]
Chen, Tianshi [1 ,3 ,4 ]
Li, Ling [5 ]
Liu, Shaoli [1 ,3 ]
Zhou, Jinhong [1 ,3 ]
Temam, Olivier [6 ]
Feng, Xiaobing [7 ]
Zhou, Xuehai [8 ]
Chen, Yunji [1 ,2 ,4 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Intelligent Processor Res Ctr, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing 100049, Peoples R China
[3] Cambricon Technol Corp Ltd, Beijing 100191, Peoples R China
[4] CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing 100190, Peoples R China
[5] Chinese Acad Sci, Inst Software, Beijing 100190, Peoples R China
[6] Inria Scalay, F-91120 Palaiseau, France
[7] Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing 100190, Peoples R China
[8] Univ Sci & Technol China, Hefei 230026, Peoples R China
基金
北京市自然科学基金;
关键词
Neural networks; Machine learning; Testing; Support vector machines; Linear regression; Computers; Computer architecture; Accelerator; machine learning (ML) techniques; multicore accelerator;
D O I
10.1109/TCAD.2019.2927523
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, machine learning (ML) techniques are proven to be powerful tools in various emerging applications. Traditionally, ML techniques are processed on general-purpose CPUs and GPUs, but their energy efficiencies are limited due to their excessive support for flexibility. As an efficient alternative to CPUs/GPUs, hardware accelerators are still limited as they often accommodate only a single ML technique (family). However, different problems may require different ML techniques, which implies that such accelerators may achieve poor learning accuracy or even be ineffective. In this paper, we present a polyvalent accelerator architecture integrated with multiple processing cores, called ParaML, which accommodates ten representative ML techniques, including k-means, k-nearest neighbors (k-NN), naive Bayes (NB), support vector machine (SVM), linear regression (LR), classification tree (CT), deep neural network (DNN), learning vector quantization (LVQ), parzen window (PW), and principal component analysis (PCA). Benefited from our thorough analysis on computational primitives and locality properties of different ML techniques, the single-core ParaML can perform up to 1056 GOP/s (e.g., additions and multiplications) in an area of 3.51 mm(2) and consumes 596 mW only, estimated by ICC and PrimeTime PX with post-synthesis netlist, respectively. Compared with the NVIDIA K20M GPU (28-nm process), the single-core ParaML (65-nm process) is 1.21x faster, and can reduce the energy by 137.93x. We also compare the single-core ParaML with other accelerators. Compared with PRINS, single-core ParaML achieves 72.09x and 2.57x energy benefit for k-NN and k-means, respectively, and speeds up each query in k-NN by 44.76x. Compared with EIE, the single-core ParaML achieves 5.02x speedup and 4.97x energy benefit with 11.62x less area when evaluating with dense DNN. Compared with TPU, the single-core ParaML achieves 2.45x better power efficiency (5647 Gop/W versus 2300 Gop/W) with 321.36x less area. Compared to the single-core version, the 8-core ParaML will further improve the speedup up to 3.98x with an area of 13.44 mm(2) and a power of 2036 mW.
引用
收藏
页码:1764 / 1777
页数:14
相关论文
共 50 条
  • [31] OR-ML: Enhancing Reliability for Machine Learning Accelerator with Opportunistic Redundancy
    Dong, Bo
    Wang, Zheng
    Chen, Wenxuan
    Chen, Chao
    Yang, Yongkui
    Yu, Zhibin
    PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021), 2021, : 739 - 742
  • [32] Distance preserving machine learning for uncertainty aware accelerator capacitance predictions
    Goldenberg, Steven
    Schram, Malachi
    Rajput, Kishansingh
    Britton, Thomas
    Pappas, Chris
    Lu, Dan
    Walden, Jared
    Radaideh, Majdi, I
    Cousineau, Sarah
    Harave, Sudarshan
    MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2024, 5 (04):
  • [33] Universal Reconfigurable Hardware Accelerator for Sparse Machine Learning Predictive Models
    Vranjkovic, Vuk
    Teodorovic, Predrag
    Struharik, Rastislav
    ELECTRONICS, 2022, 11 (08)
  • [34] Laser Beam Control Using Machine Learning Technology for Particle Accelerator
    Jin, Kai
    Hu, Yimin
    Lu, Wei
    Zhang, Shukui
    OPTICAL COMPONENTS AND MATERIALS XIX, 2022, 11997
  • [35] Memristive Boltzmann Machine: A Hardware Accelerator for Combinatorial Optimization and Deep Learning
    Bojnordi, Mandi Nazm
    Ipek, Engin
    2017 FIFTH BERKELEY SYMPOSIUM ON ENERGY EFFICIENT ELECTRONIC SYSTEMS & STEEP TRANSISTORS WORKSHOP (E3S), 2017,
  • [36] Machine learning as a service system for particle accelerator and its application in CSNS
    Mei, Hao
    Zhang, Yuliang
    Peng, Na
    Cheng, Sinong
    He, Yongcheng
    Xue, Kangjia
    Wang, Lin
    Li, Mingtao
    Wu, Xuan
    Zhu, Peng
    RADIATION DETECTION TECHNOLOGY AND METHODS, 2025,
  • [37] Memristive Boltzmann Machine: A Hardware Accelerator for Combinatorial Optimization and Deep Learning
    Bojnordi, Mahdi Nazm
    Ipek, Engin
    PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA-22), 2016, : 1 - 13
  • [38] A Flexible Low-Power Machine Learning Accelerator for Healthcare Applications
    Huang, Shan
    Xie, Zhicheng
    Han, Jun
    Zeng, Xiaoyang
    2016 13TH IEEE INTERNATIONAL CONFERENCE ON SOLID-STATE AND INTEGRATED CIRCUIT TECHNOLOGY (ICSICT), 2016, : 613 - 615
  • [39] Coarse-grained Reconfigurable Hardware Accelerator of Machine Learning Classifiers
    Vranjkovic, Vuk
    Struharik, Rastislav
    PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON SYSTEMS, SIGNALS AND IMAGE PROCESSING, (IWSSIP 2016), 2016, : 193 - 196
  • [40] On-line Machine Learning Accelerator on Digital RRAM-Crossbar
    Ni, Leibin
    Huang, Hantao
    Yu, Hao
    2016 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2016, : 113 - 116