Enhancing Data Reuse in Cache Contention Aware Thread Scheduling on GPGPU

被引：0

作者：

Lu, Chin-Fu ^{[1
]}

Kuo, Hsien-Kai ^{[2
]}

Lai, Bo-Cheng Charles ^{[3
]}

机构：

[1] Marvell Taiwan Ltd, Hsinchu, Taiwan

[2] MediaTek Inc, Hsinchu, Taiwan

[3] Natl Chiao Tung Univ, Hsinchu, Taiwan

来源：

PROCEEDINGS OF 2016 10TH INTERNATIONAL CONFERENCE ON COMPLEX, INTELLIGENT, AND SOFTWARE INTENSIVE SYSTEMS (CISIS) | 2016年

关键词：

GPGPU; cache; thread scheduling; performance;

D O I：

10.1109/CISIS.2016.132

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

GPGPUs have been widely adopted as throughput processing platforms for modern big-data and cloud computing. Attaining a high performance design on a GPGPU requires careful tradeoffs among various design concerns. Data reuse, cache contention, and thread level parallelism, have been demonstrated as three imperative performance factors for a GPGPU. The correlated performance impacts of these factors pose non-trivial concerns when scheduling threads on GPGPUs. This paper proposes a three-staged scheduling scheme to co-schedule the threads with consideration of the three factors. The experiment results on a set of irregular parallel applications, when compared with previous approaches, have demonstrated up to 70% execution time improvement.

引用

页码：351 / 356

页数：6

共 50 条

[1] Improving GPGPU Performance via Cache Locality Aware Thread Block Scheduling
Chen, Li-Jhan
Cheng, Hsiang-Yun
Wang, Po-Han
Yang, Chia-Lin
[J]. IEEE COMPUTER ARCHITECTURE LETTERS, 2017, 16 (02) : 127 - 131
[2] Cache-Hierarchy Contention-Aware Scheduling in CMPs
Feliu, Josue
Petit, Salvador
Sahuquillo, Julio
Duato, Jose
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2014, 25 (03) : 581 - 590
[3] Thread Affinity Mapping for Irregular Data Access on Shared Cache GPGPU
Kuo, Hsien-Kai
Chen, Kuan-Ting
Lai, Bo-Cheng Charles
Jou, Jing-Yang
[J]. 2012 17TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2012, : 659 - 664
[4] OWL: Cooperative Thread Array Aware Scheduling Techniques for Improving GPGPU Performance
Jog, Adwait
Kayiran, Onur
Nachiappan, Nachiappan Chidambaram
Mishra, Asit K.
Kandemir, Mahmut T.
Mutlu, Onur
Iyer, Ravishankar
Das, Chita R.
[J]. ACM SIGPLAN NOTICES, 2013, 48 (04) : 395 - 406
[5] A Capacity-Aware Thread Scheduling Method Combined with Cache Partitioning to Reduce Inter-Thread Cache Conflicts
Sato, Masayuki
Egawa, Ryusuke
Takizawa, Hiroyuki
Kobayashi, Hiroaki
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (09): : 2047 - 2054
[6] Shuffling: A Framework for Lock Contention Aware Thread Scheduling for Multicore Multiprocessor Systems
Pusukuri, Kishore Kumar
Gupta, Rajiv
Bhuyan, Laxmi N.
[J]. PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'14), 2014, : 289 - 300
[7] Inter-kernel Reuse-aware Thread Block Scheduling
Huzaifa, Muhammad
Alsop, Johnathan
Mahmoud, Abdulrahman
Salvador, Giordano
Sinclair, Matthew D.
Adve, Sarita, V
[J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2020, 17 (03)
[8] Lock Contention Aware Thread Migrations
Pusukuri, Kishore Kumar
Gupta, Rajiv
Bhuyan, Laxmi Narayan
[J]. ACM SIGPLAN NOTICES, 2014, 49 (08) : 369 - 370
[9] Thread scheduling for cache locality
Philbin, J
Edler, J
Anshus, OJ
Douglas, CC
Li, K
[J]. ACM SIGPLAN NOTICES, 1996, 31 (09) : 60 - 71
[10] Orchestrating Cache Management and Memory Scheduling for GPGPU Applications
Mu, Shuai
Deng, Yandong
Chen, Yubei
Li, Huaiming
Pan, Jianming
Zhang, Wenjun
Wang, Zhihua
[J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2014, 22 (08) : 1803 - 1814

← 1 2 3 4 5 →