Global optimal partitioning of parallel loops for minimal data movement in limited memory embedded systems

被引:0
|
作者
Lin, J [1 ]
Lin, XL [1 ]
机构
[1] Univ Minnesota, Dept Comp Sci & Engn, Minneapolis, MN 55455 USA
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Embedded systems are often characterized by limited memory while many applications on these systems are memory-intensive. Reducing the overhead of data movement between global memory and distributed local memory in such a system is critical to the performance of these applications. In this paper, we propose a unified theoretical framework for automatically partitioning parallel loops to optimize the data movement on such systems. We first introduce the notion of data movement and build a simple but accurate data movement model to estimate the overhead of the data movement for the footprint. We then present an algorithm to derive an optimal loop partitioning to minimize the number of data movement across the loop nests. We have implemented the framework in a parallel compiler on VE16, a limited memory embedded commercial system, and the experiment results demonstrate the efficiency of the proposed method.
引用
收藏
页码:3 / 9
页数:7
相关论文
共 50 条
  • [31] Data and memory optimization techniques for embedded systems
    Panda, PR
    Catthoor, F
    Dutt, ND
    Danckaert, K
    Brockmeyer, E
    Kulkarni, C
    Vandercappelle, A
    Kjeldsberg, PG
    ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2001, 6 (02) : 149 - 206
  • [32] Optimal code and data layout in embedded systems
    Kumar, TSR
    Govindarajan, R
    Kumar, CPR
    16TH INTERNATIONAL CONFERENCE ON VLSI DESIGN, PROCEEDINGS, 2003, : 573 - 578
  • [33] Data-flow assisted behavioral partitioning for embedded systems
    Agrawal, S
    Gupta, RK
    DESIGN AUTOMATION CONFERENCE - PROCEEDINGS 1997, 1997, : 709 - 712
  • [34] Replication and partitioning for data arrays in distributed memory systems
    Wang, SD
    Jwo, WD
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 1998, 14 (01) : 281 - 298
  • [35] Optimizing Data Distribution for Loops on Embedded Multicore with Scratch-Pad Memory
    Gao, Qiuyan
    Zhuge, Qingfeng
    Zhang, Jun
    Zhu, Guanyu
    Sha, Edwin H. -M.
    JOURNAL OF COMPUTERS, 2014, 9 (05) : 1066 - 1076
  • [36] Communication and memory optimal parallel data cube construction
    Jin, RM
    Yang, G
    Vaidyanathan, K
    Agrawal, G
    2003 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDINGS, 2003, : 573 - 580
  • [37] Communication and memory optimal parallel data cube construction
    Jin, RM
    Vaidyanathan, K
    Yang, G
    Agrawal, G
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2005, 16 (12) : 1105 - 1119
  • [38] Optimal Matrix Partitioning for Data Parallel Computing on Hybrid Heterogeneous Platforms
    Malik, Tania
    Lastovetsky, Alexey
    2020 19TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC 2020), 2020, : 1 - 11
  • [39] Combining Memory Partitioning and Subtask Generation for Parallel Data Access on CGRAs
    Li, Cheng
    Gu, Jiangyuan
    Yin, Shouyi
    Liu, Leibo
    Wei, Shaojun
    2021 26TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2021, : 204 - 209
  • [40] Code Generation for Parallel Execution of a Class of Irregular Loops on Distributed Memory Systems
    Ravishankar, Mahesh
    Eisenlohr, John
    Pouchet, Louis-Noel
    Ramanujam, J.
    Rountev, Atanas
    Sadayappan, P.
    2012 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2012,