Locality optimizations for Jacobi iteration on distributed parallel systems

被引：0

作者：

Che, YG ^{[1
]}

Wang, ZH

Li, XM

Yang, LT

机构：

[1] Natl Univ Def Technol, Sch Comp, Changsha 410073, Peoples R China

[2] St Francis Xavier Univ, Dept Comp Sci, Antigonish, NS B2G 2W5, Canada

[3] Inst Equipment & Command Technol, Beijing, Peoples R China

来源：

PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, PROCEEDINGS | 2004年 / 3358卷

关键词：

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In this paper, we propose an inter-nest cache reuse optimization method for Jacobi codes. This method is easy to apply, but effective in that it enhances cache locality of the Jacobi codes while preserving their coarse grain parallelism. We compare our method to two previous locality enhancement techniques that can be used for Jacobi codes: time skewing and new tiling. We quantitatively calculate the main contributing factors to the runtime of different Jacobi codes. We also perform experiments on a PC cluster to verify our analysis. The results show that our method performs poorer than time skewing and new tiling for uniprocessor, but performs better for distributed parallel system.

引用

页码：91 / 104

页数：14

共 50 条

[41] Decision-aided Jacobi iteration for signal detection in massive MIMO systems
Lee, Yinman
ELECTRONICS LETTERS, 2017, 53 (23) : 1552 - 1553
[42] Link adaptation with distributed Jacobi eigenbeamforming for MIMO systems
Zacarias, Eduardo B.
Werner, Stefan
Wichman, Risto
2007 FOURTH INTERNATIONAL SYMPOSIUM ON WIRELESS COMMUNICATION SYSTEMS, VOLS 1 AND 2, 2007, : 641 - 645
[43] NUMERICAL PERFORMANCE OF AN ASYNCHRONOUS JACOBI ITERATION
BULL, JM
FREEMAN, TL
LECTURE NOTES IN COMPUTER SCIENCE, 1992, 634 : 361 - 366
[44] Implementing Asynchronous Jacobi Iteration on GPUs
Tsai, Yu-Hsiang Mike
Nayak, Pratik
Chow, Edmond
Anzt, Hartwig
Proceedings of ScalAH 2022: 13th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Heterogeneous Systems, Held in conjunction with SC 2022: The International Conference for High Performance Computing, Networking, Storage and Analysis, 2022, : 1 - 9
[45] Scalability in distributed systems, parallel systems and supercomputers
Kremien, O
HIGH-PERFORMANCE COMPUTING AND NETWORKING, 1995, 919 : 532 - 541
[46] Locality-aware Optimizations for Improving Remote Memory Latency in Multi-GPU Systems
Belayneh, Leul
Ye, Haojie
Chen, Kuan-Yu
Blaauw, David
Mudge, Trevor
Dreslinski, Ronald
Talati, Nishil
PROCEEDINGS OF THE 2022 31ST INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT 2022, 2022, : 304 - 316
[47] PARALLEL DISTRIBUTED-PROCESSING CHALLENGES THE STRONG MODULARITY HYPOTHESIS, NOT THE LOCALITY ASSUMPTION
PLAUT, DC
BEHAVIORAL AND BRAIN SCIENCES, 1994, 17 (01) : 77 - 78
[48] An efficient parallel iteration algorithm for nonlinear diffusion equations with time extrapolation techniques and the Jacobi explicit scheme
Miao, Shuai
Yao, Yanzhong
Lv, Guixia
JOURNAL OF COMPUTATIONAL PHYSICS, 2021, 441
[49] An efficient parallel iteration algorithm for nonlinear diffusion equations with time extrapolation techniques and the Jacobi explicit scheme
Miao, Shuai
Yao, Yanzhong
Lv, Guixia
Journal of Computational Physics, 2021, 441
[50] PERFORMANCE EVALUATION OF PARALLEL PROGRAMS IN PARALLEL AND DISTRIBUTED SYSTEMS
MOHR, B
LECTURE NOTES IN COMPUTER SCIENCE, 1990, 457 : 176 - 187

← 1 2 3 4 5 →