Partitioning Streaming Parallelism for Multi-cores: A Machine Learning Based Approach

被引:0
|
作者
Wang, Zheng [1 ]
O'Boyle, Michael F. P. [1 ]
机构
[1] Univ Edinburgh, Sch Informat, Inst Comp Syst Architecture, Edinburgh EH8 9YL, Midlothian, Scotland
关键词
Compiler Optimization; Machine Learning; Partitioning Streaming Parallelism;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Stream based languages are a popular approach to expressing parallelism in modern applications. The efficient mapping of streaming parallelism to multi-core processors is, however, highly dependent on the program and underlying architecture. We address this by developing a portable and automatic compiler-based approach to partitioning streaming programs using machine learning. Our technique predicts the ideal partition structure for a given streaming application using prior knowledge learned off-line. Using the predictor we rapidly search the program space (without executing any code) to generate and select a good partition. We applied this technique to standard Stream It applications and compared against existing approaches. On a 4-core platform, our approach achieves 60% of the best performance found by iteratively compiling and executing over 3000 different partitions per program. We obtain, on average, a 1.90x speedup over the already tuned partitioning scheme of the Stream It compiler. When compared against a state-of-the-art analytical, model-based approach, we achieve, on average, a 1.77x performance improvement. By porting our approach to a 8-core platform, we are able to obtain 1.8x improvement over the Stream It default scheme, demonstrating the portability of our approach.
引用
收藏
页码:307 / 318
页数:12
相关论文
共 50 条
  • [21] The Research on Surface Restructure Method of Discrete Data Based on Multi-cores Environment
    Zheng, Huijiang
    Zhang, Jing
    Ding, Baihui
    He, Gaiyun
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCES IN MECHANICAL ENGINEERING AND INDUSTRIAL INFORMATICS, 2015, 15 : 1895 - 1899
  • [22] An autonomic-computing approach on mapping threads to multi-cores for software transactional memory
    Zhou, Naweiluo
    Delaval, Gwenael
    Robu, Bogdan
    Rutten, Eric
    Mehaut, Jean-Francois
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (18):
  • [23] A Software Based Profiling Method for Obtaining Speedup Stacks on Commodity Multi-Cores
    Eklov, David
    Nikoleris, Nikos
    Hagersten, Erik
    [J]. 2014 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS), 2014, : 148 - 157
  • [24] Building real-time parallel task systems on multi-cores: A hierarchical scheduling approach
    Yang, Tao
    Deng, Qingxu
    Sun, Lei
    [J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2019, 92 : 1 - 11
  • [25] A Novel Thread Partitioning Approach based on Machine Learning for Speculative Multithreading
    Liu, Bin
    Zhao, Yinliang
    Zhong, Xiang
    Liang, Zengyu
    Feng, Boqin
    [J]. 2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 826 - 836
  • [26] Cache-Aware Allocation of Parallel Jobs on Multi-cores based on Learned Recency
    Zhao, Shuai
    Dai, Xiaotian
    Lesage, Benjamin
    Bate, Iain
    [J]. PROCEEDINGS OF 31ST INTERNATIONAL CONFERENCE ON REAL-TIME NETWORKS AND SYSTEMS, RTNS 2023, 2023, : 177 - 187
  • [27] Memory Utilization-Based Dynamic Bandwidth Regulation for Temporal Isolation in Multi-Cores
    Saeed, Ahsan
    Dasari, Dakshina
    Ziegenbein, Dirk
    Rajasekaran, Varun
    Rehm, Falk
    Pressler, Michael
    Hamann, Arne
    Mueller-Gritschneder, Daniel
    Gerstlauer, Andreas
    Schlichtmann, Ulf
    [J]. 2022 IEEE 28TH REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM (RTAS), 2022, : 133 - 145
  • [28] Critical-Path-First Based Allocation of Real-Time Streaming Applications on 2D Mesh-Type Multi-Cores
    Abdel Aziz Ali, Hazem Ismail
    Pinho, Luis Miguel
    Akesson, Benny
    [J]. 2013 IEEE 19TH INTERNATIONAL CONFERENCE ON EMBEDDED AND REAL-TIME COMPUTING SYSTEMS AND APPLICATIONS (RTCSA), 2013, : 201 - 208
  • [29] NPU-Accelerated Imitation Learning for Thermal Optimization of QoS-Constrained Heterogeneous Multi-Cores
    Rapp, Martin
    Khdr, Heba
    Krohmer, Nikita
    Henkel, Jorg
    [J]. ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2024, 29 (01)
  • [30] Energy Reduction Through Memory Aware Real-Time Scheduling on Virtual Machine in Multi-Cores Server
    Alqudah, Mohammad A.
    Ahmed, Iqra
    Ahmad, Fahad
    Naseem, Shahid
    Nisar, Kottakkaran Sooppy
    [J]. IEEE ACCESS, 2021, 9 : 55436 - 55447