Fast Data Delivery for Many-Core Processors

被引:18
|
作者
Bakhshalipour, Mohammad [1 ,2 ]
Lotfi-Kamran, Pejman [2 ]
Mazloumi, Abbas [3 ,4 ]
Samandi, Farid [1 ,2 ]
Naderan-Tahan, Mahmood [5 ]
Modarressi, Mehdi [6 ]
Sarbazi-Azad, Hamid [2 ,7 ]
机构
[1] SUT, Tehran, Iran
[2] Inst Res Fundamental Sci IPM, Sch Comp Sci, Tehran, Iran
[3] Univ Tehran, Tehran, Iran
[4] Univ Calif Riverside, Dept Comp Sci, Riverside, CA 92521 USA
[5] Shahid Chamran Univ Ahvaz SCU, Dept Comp Engn, Fac Engn, Ahvaz, Khuzestan, Iran
[6] Univ Tehran, Sch Elect & Comp Engn, Tehran, Iran
[7] SUT, Dept Comp Engn, Tehran, Iran
基金
美国国家科学基金会;
关键词
Memory system; network-on-chip; circuit switching; data prefetching; ON-CHIP;
D O I
10.1109/TC.2018.2821144
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Server workloads operate on large volumes of data. As a result, processors executing these workloads encounter frequent L1-D misses. In a many-core processor, an L1-D miss causes a request packet to be sent to an LLC slice and a response packet to be sent back to the L1-D, which results in high overhead. While prior work targeted response packets, this work focuses on accelerating the request packets. Unlike aggressive OoO cores, simpler cores used in many-core processors cannot hide the latency of L1-D request packets. We observe that LLC slices that serve L1-D misses are strongly temporally correlated. Taking advantage of this observation, we design a simple and accurate predictor. Upon the occurrence of an L1-D miss, the predictor identifies the LLC slice that will serve the next L1-D miss and a circuit will be set up for the upcoming miss request to accelerate its transmission. When the upcoming miss occurs, the resulting request can use the already established circuit for transmission to the LLC slice. We show that our proposal outperforms data prefetching mechanisms in a many-core processor due to (1) higher prediction accuracy and (2) not wasting valuable off-chip bandwidth, while requiring significantly less overhead. Using full-system simulation, we show that our proposal accelerates serving data misses by 22 percent and leads to 10 percent performance improvement over the state-of-the-art network-on-chip.
引用
收藏
页码:1416 / 1429
页数:14
相关论文
共 50 条
  • [1] Economic models for many-core processors
    Kumar, Rakesh
    [J]. DR DOBBS JOURNAL, 2008, 33 (03): : 10 - 10
  • [2] Energy-Efficient Power Delivery System Paradigms for Many-Core Processors
    Li, Haoran
    Wang, Xuan
    Xu, Jiang
    Wang, Zhe
    Maeda, Rafael K. V.
    Wang, Zhehui
    Yang, Peng
    Duong, Luan H. K.
    Wang, Zhifei
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2017, 36 (03) : 449 - 462
  • [3] Federated Scheduling in Clustered Many-core Processors
    Koike, Ryotaro
    Azumi, Takuya
    [J]. PROCEEDINGS OF THE 2021 IEEE/ACM 25TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED SIMULATION AND REAL TIME APPLICATIONS (DS-RT 2021), 2021,
  • [4] Instruction Fusion for Multiscalar and Many-Core Processors
    Yaojie Lu
    Sotirios G. Ziavras
    [J]. International Journal of Parallel Programming, 2017, 45 : 67 - 78
  • [5] Efficient Fault Simulation on Many-Core Processors
    Kochte, Michael A.
    Schaal, Marcel
    Wunderlich, Hans-Joachim
    Zoellin, Christian G.
    [J]. PROCEEDINGS OF THE 47TH DESIGN AUTOMATION CONFERENCE, 2010, : 380 - 385
  • [6] Instruction Fusion for Multiscalar and Many-Core Processors
    Lu, Yaojie
    Ziavras, Sotirios G.
    [J]. INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2017, 45 (01) : 67 - 78
  • [7] Emerging Applications for Multi/Many-Core Processors
    Lee, Victor W.
    Chen, Yen-Kuang
    Debuy, Pradeep
    [J]. 2011 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2011, : 1524 - 1527
  • [8] Fast parallel stream compaction for IA-based multi/many-core processors
    Sun, Qiao
    Yang, Chao
    Wu, Changmao
    Li, Leisheng
    Liu, Fangfang
    [J]. 2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2016, : 736 - 745
  • [9] Workload-Aware Adaptive Power Delivery System Management for Many-Core Processors
    Li, Haoran
    Xu, Jiang
    Wang, Zhe
    Maeda, Rafael K., V
    Yang, Peng
    Tian, Zhongyuan
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (10) : 2076 - 2086
  • [10] Many-core processors and GPU opportunities in Particle Detectors
    Neufeld, Niko
    Vilasis-Cardona, Xavier
    [J]. 2012 13TH INTERNATIONAL WORKSHOP ON CELLULAR NANOSCALE NETWORKS AND THEIR APPLICATIONS (CNNA), 2012,