Deep learning parallel computing and evaluation for embedded system clustering architecture processor

被引:4
|
作者
Zu, Yue [1 ]
机构
[1] Jilin Inst Chem Technol, Dept Human Resources Off, Jilin 132022, Jilin, Peoples R China
关键词
Clustered architecture processor; Parallel computing; Deep learning; Performance evaluation;
D O I
10.1007/s10617-020-09235-5
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In the era of intelligence, the processing of a large amount of information and various intelligent applications need to rely on embedded devices. This trend has made machine learning algorithms play an increasingly important role. High-performance embedded computing is an effective means to solve the lack of computing power of embedded devices. Aiming at the problem that the calculation amount of new intelligent embedded applications based on machine learning technology is higher, the computing power of traditional embedded systems is difficult to meet their needs, this paper studies the parallel optimization and implementation techniques of convolutional neural networks in Parallella platform. The parallel optimization strategy of convolutional neural network on the clustering architecture processor of heterogeneous multi-core system is given. Then the high-performance implementation of convolutional neural network on Parallella platform is studied, and the function of convolutional neural network system is implemented. A set of performance evaluation methods for embedded parallel processors is proposed. From the application point of S698P, the eCos operating system is selected as the platform. The single-core mode and multi-core mode are compared on the simulator GRSIM, and the parallel performance evaluation is given. Experiments have shown that the efficiency of deep learning tasks is significantly improved compared to traditional parallel methods.
引用
收藏
页码:145 / 159
页数:15
相关论文
共 50 条
  • [21] Parallel embedded processor architecture for FPGA-based image processing using parallel software skeletons
    Hanen Chenini
    Jean Pierre Dérutin
    Romuald Aufrère
    Roland Chapuis
    EURASIP Journal on Advances in Signal Processing, 2013 (1)
  • [22] Learning Distributed Representations and Deep Embedded Clustering of Texts
    Wang, Shuang
    Beheshti, Amin
    Wang, Yufei
    Lu, Jianchao
    Sheng, Quan Z.
    Elbourn, Stephen
    Alinejad-Rokny, Hamid
    ALGORITHMS, 2023, 16 (03)
  • [23] An Efficient Hardware Architecture for Activation Function in Deep Learning Processor
    Li, Lin
    Zhang, Shengbing
    Wu, Juan
    2018 IEEE 3RD INTERNATIONAL CONFERENCE ON IMAGE, VISION AND COMPUTING (ICIVC), 2018, : 911 - 918
  • [24] A Deep Learning Convolution Architecture for Simple Embedded Applications
    Kim, Chan
    Cho, Yong Cheol Peter
    Kwon, Youngsu
    2017 IEEE 7TH INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - BERLIN (ICCE-BERLIN), 2017, : 74 - 78
  • [25] Parallel hierarchical clustering algorithms on processor arrays with a reconfigurable bus system
    Tsai, HR
    Horng, SJ
    Lee, SS
    Tsai, SS
    Kao, TW
    PATTERN RECOGNITION, 1997, 30 (05) : 801 - 815
  • [26] Parallel Computing in Deep Learning: bioinformatics case studies
    Giansanti, Valentina
    Beretta, Stefano
    Cesini, Daniele
    D'Agostino, Daniele
    Merelli, Ivan
    2019 27TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP), 2019, : 329 - 333
  • [27] ParaStation: Efficient parallel computing by clustering workstations: Design and evaluation
    Warschko, TM
    Blum, JM
    Tichy, WF
    JOURNAL OF SYSTEMS ARCHITECTURE, 1997, 44 (3-4) : 241 - 260
  • [28] A Novel DSP Architecture for Scientific Computing and Deep Learning
    Yang, Chao
    Chen, Shuming
    Zhang, Jian
    Lv, Zhao
    Wang, Zhi
    IEEE ACCESS, 2019, 7 : 36413 - 36425
  • [29] Parallel architecture benchmarking: from embedded computing to HPC, a FiPS project perspective
    Lhuillier, Yves
    Philippe, Jean-Marc
    Guerre, Alexandre
    Kierzynka, Michal
    Oleksiak, Ariel
    2014 12TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (EUC 2014), 2014, : 154 - 161
  • [30] Massively scalable prototype learning for heterogeneous parallel computing architecture
    Su T.
    Li S.
    Deng S.
    Yu Y.
    Bai W.
    Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2016, 48 (11): : 53 - 60