Fast Data Delivery for Many-Core Processors

被引：18

作者：

Bakhshalipour, Mohammad ^{[1
,2
]}

Lotfi-Kamran, Pejman ^{[2
]}

Mazloumi, Abbas ^{[3
,4
]}

Samandi, Farid ^{[1
,2
]}

Naderan-Tahan, Mahmood ^{[5
]}

Modarressi, Mehdi ^{[6
]}

Sarbazi-Azad, Hamid ^{[2
,7
]}

机构：

[1] SUT, Tehran, Iran

[2] Inst Res Fundamental Sci IPM, Sch Comp Sci, Tehran, Iran

[3] Univ Tehran, Tehran, Iran

[4] Univ Calif Riverside, Dept Comp Sci, Riverside, CA 92521 USA

[5] Shahid Chamran Univ Ahvaz SCU, Dept Comp Engn, Fac Engn, Ahvaz, Khuzestan, Iran

[6] Univ Tehran, Sch Elect & Comp Engn, Tehran, Iran

[7] SUT, Dept Comp Engn, Tehran, Iran

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2018年 / 67卷 / 10期

基金：

美国国家科学基金会;

关键词：

Memory system; network-on-chip; circuit switching; data prefetching; ON-CHIP;

D O I：

10.1109/TC.2018.2821144

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Server workloads operate on large volumes of data. As a result, processors executing these workloads encounter frequent L1-D misses. In a many-core processor, an L1-D miss causes a request packet to be sent to an LLC slice and a response packet to be sent back to the L1-D, which results in high overhead. While prior work targeted response packets, this work focuses on accelerating the request packets. Unlike aggressive OoO cores, simpler cores used in many-core processors cannot hide the latency of L1-D request packets. We observe that LLC slices that serve L1-D misses are strongly temporally correlated. Taking advantage of this observation, we design a simple and accurate predictor. Upon the occurrence of an L1-D miss, the predictor identifies the LLC slice that will serve the next L1-D miss and a circuit will be set up for the upcoming miss request to accelerate its transmission. When the upcoming miss occurs, the resulting request can use the already established circuit for transmission to the LLC slice. We show that our proposal outperforms data prefetching mechanisms in a many-core processor due to (1) higher prediction accuracy and (2) not wasting valuable off-chip bandwidth, while requiring significantly less overhead. Using full-system simulation, we show that our proposal accelerates serving data misses by 22 percent and leads to 10 percent performance improvement over the state-of-the-art network-on-chip.

引用

页码：1416 / 1429

页数：14

共 50 条

[1] Economic models for many-core processors
Kumar, Rakesh
[J]. DR DOBBS JOURNAL, 2008, 33 (03): : 10 - 10
[2] Energy-Efficient Power Delivery System Paradigms for Many-Core Processors
Li, Haoran
Wang, Xuan
Xu, Jiang
Wang, Zhe
Maeda, Rafael K. V.
Wang, Zhehui
Yang, Peng
Duong, Luan H. K.
Wang, Zhifei
[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2017, 36 (03) : 449 - 462
[3] Federated Scheduling in Clustered Many-core Processors
Koike, Ryotaro
Azumi, Takuya
[J]. PROCEEDINGS OF THE 2021 IEEE/ACM 25TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED SIMULATION AND REAL TIME APPLICATIONS (DS-RT 2021), 2021,
[4] Instruction Fusion for Multiscalar and Many-Core Processors
Yaojie Lu
Sotirios G. Ziavras
[J]. International Journal of Parallel Programming, 2017, 45 : 67 - 78
[5] Efficient Fault Simulation on Many-Core Processors
Kochte, Michael A.
Schaal, Marcel
Wunderlich, Hans-Joachim
Zoellin, Christian G.
[J]. PROCEEDINGS OF THE 47TH DESIGN AUTOMATION CONFERENCE, 2010, : 380 - 385
[6] Instruction Fusion for Multiscalar and Many-Core Processors
Lu, Yaojie
Ziavras, Sotirios G.
[J]. INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2017, 45 (01) : 67 - 78
[7] Emerging Applications for Multi/Many-Core Processors
Lee, Victor W.
Chen, Yen-Kuang
Debuy, Pradeep
[J]. 2011 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2011, : 1524 - 1527
[8] Fast parallel stream compaction for IA-based multi/many-core processors
Sun, Qiao
Yang, Chao
Wu, Changmao
Li, Leisheng
Liu, Fangfang
[J]. 2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2016, : 736 - 745
[9] Workload-Aware Adaptive Power Delivery System Management for Many-Core Processors
Li, Haoran
Xu, Jiang
Wang, Zhe
Maeda, Rafael K., V
Yang, Peng
Tian, Zhongyuan
[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (10) : 2076 - 2086
[10] Many-core processors and GPU opportunities in Particle Detectors
Neufeld, Niko
Vilasis-Cardona, Xavier
[J]. 2012 13TH INTERNATIONAL WORKSHOP ON CELLULAR NANOSCALE NETWORKS AND THEIR APPLICATIONS (CNNA), 2012,

← 1 2 3 4 5 →