Per-Thread Cycle Accounting in SMT Processors

被引:28
|
作者
Eyerman, Stijn [1 ]
Eeckhout, Lieven [1 ]
机构
[1] Univ Ghent, ELIS Dept, Ghent, Belgium
关键词
Design; Experimentation; Performance; Simultaneous Multithreading (SMT); Cycle accounting; Thread-progress aware fetch policy; PERFORMANCE;
D O I
10.1145/1508284.1508260
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This paper proposes a cycle accounting architecture for Simultaneous Multithreading (SMT) processors that estimates the execution times for each of the threads had they been executed alone, while they are running simultaneously on the SMT processor. This is done by accounting each cycle to either a base, miss event or waiting cycle component during multi-threaded execution. Single-threaded alone execution time is then estimated as the sum of the base and miss event components; the waiting cycle component represents the lost cycle count due to SMT execution. The cycle accounting architecture incurs reasonable hardware cost (around 1KB of storage) and estimates single-threaded performance with average prediction errors around 7.2% for two-program workloads and 11.7% for four-program workloads. The cycle accounting architecture has several important applications to system software and its interaction with SMT hardware. For one, the estimated single-thread alone execution time provides an accurate picture to system software of the actually consumed processor cycles per thread. The alone execution time instead of the total execution time (timeslice) may make system software scheduling policies more effective. Second, a new class of thread-progress aware SMT fetch policies based on per-thread progress indicators enable system software level priorities to be enforced at the hardware level.
引用
收藏
页码:133 / 144
页数:12
相关论文
共 50 条
  • [1] Per-Thread Cycle Accounting in Multicore Processors
    Du Bois, Kristof
    Eyerman, Stijn
    Eeckhout, Lieven
    [J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2013, 9 (04)
  • [2] PER-THREAD CYCLE ACCOUNTING
    Eyerman, Stijn
    Eeckhout, Lieven
    [J]. IEEE MICRO, 2010, 30 (01) : 71 - 80
  • [3] A Real-Time Per-Thread IQ-Capping Technique for Simultaneous Multi-Threading (SMT) Processors
    Sahba, Amin
    Zhang, Yilin
    Hays, Marcus
    Lin, Wei-Ming
    [J]. 2014 11TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS (ITNG), 2014, : 413 - 418
  • [4] KARD: Lightweight Data Race Detection with Per-Thread Memory Protection
    Ahmad, Adil
    Lee, Sangho
    Fonseca, Pedro
    Lee, Byoungyoung
    [J]. ASPLOS XXVI: TWENTY-SIXTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, 2021, : 647 - 660
  • [5] Federation: Boosting Per-Thread Performance of Throughput-Oriented Manycore Architectures
    Boyer, Michael
    Tarjan, David
    Skadron, Kevin
    [J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2010, 7 (04)
  • [6] Thread Isolation to Improve Symbiotic Scheduling on SMT Multicore Processors
    Feliu, Josue
    Sahuquillo, Julio
    Petit, Salvador
    Eeckhout, Lieven
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (02) : 359 - 373
  • [7] Eliminating inter-thread interference in register file for SMT processors
    Yang, H
    Cui, G
    Yang, XZ
    [J]. PDCAT 2005: SIXTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2005, : 40 - 45
  • [8] Using instruction fetch policy to control performance of a thread in SMT processors
    School of Computer Science, National University of Defense Technology, Changsha 410073, China
    [J]. Jisuanji Xuebao, 2008, 2 (309-317):
  • [9] Fair CPU Time Accounting in CMP plus SMT Processors
    Luque, Carlos
    Moreto, Miquel
    Cazorla, Francisco J.
    Valero, Mateo
    [J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2013, 9 (04)
  • [10] L1-Bandwidth Aware Thread Allocation in Multicore SMT Processors
    Feliu, Josue
    Sahuquillo, Julio
    Petit, Salvador
    Duato, Jose
    [J]. 2013 22ND INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT), 2013, : 123 - 132