Using Multiple Threads to Accelerate Single Thread Performance

被引:1
|
作者
Sura, Zehra [1 ]
O'Brien, Kevin [1 ]
Brunheroto, Jose [1 ]
机构
[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
D O I
10.1109/IPDPS.2014.104
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Computing systems are being designed with an increasing number of hardware cores. To effectively use these cores, applications need to maximize the amount of parallel processing and minimize the time spent in sequential execution. In this work, we aim to exploit fine-grained parallelism beyond the parallelism already encoded in an application. We define an execution model using a primary core and some number of secondary cores that collaborate to speed up the execution of sequential code regions. This execution model relies on cores that are physically close to each other and have fast communication paths between them. For this purpose, we introduce dedicated hardware queues for low-latency transfer of values between cores, and define special enque and deque instructions to use the queues. Further, we develop compiler analyses and transformations to automatically derive fine-grained parallel code from sequential code regions. We implemented this model for exploiting fine-grained parallelization in the IBM XL compiler framework and in a simulator for the Blue Gene/Q system. We also studied the Sequoia benchmarks to determine code sections where our techniques are applicable. We evaluated our work using these code sections, and observed an average speedup of 1.32 on 2 cores, and an average speedup of 2.05 on 4 cores. Since these code sections are otherwise sequentially executed, we conclude that our approach is useful for accelerating single thread performance.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Thread rolling and performance evaluations of a new anti-loosening double thread bolt combining a single thread and multiple threads
    Shinbutsu, Toshinaka
    Amano, Shuichi
    Takemasu, Teruie
    Kuwabara, Toshihiko
    Shimura, Jyo
    INTERNATIONAL CONFERENCE ON THE TECHNOLOGY OF PLASTICITY, ICTP 2017, 2017, 207 : 603 - 608
  • [2] Transparent threads: Resource sharing in SMT processors for high single-thread performance
    Dorai, GK
    Yeung, D
    2002 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PROCEEDINGS, 2002, : 30 - 41
  • [3] Rolling Formability Optimization of Locking Bolt Based on a Double-Thread Structure Composed of Coaxial Single and Multiple Threads
    Amano, Shuichi
    Shinbutsu, Toshinaka
    Okimoto, Yuki
    Takemasu, Teruie
    Shimura, Jyo
    Hasegawa, Osamu
    Kuwabara, Toshihiko
    Journal of Manufacturing Science and Engineering, 2025, 147 (01):
  • [4] Thread/State correspondence: from bit threads to qubit threads
    Yi-Yu Lin
    Jie-Chen Jin
    Journal of High Energy Physics, 2023
  • [5] Thread/State correspondence: from bit threads to qubit threads
    Lin, Yi-Yu
    Jin, Jie-Chen
    JOURNAL OF HIGH ENERGY PHYSICS, 2023, 2023 (02)
  • [6] Threads that weave resistance - Sonia Gomes' ancestral thread by thread
    Santiago, Lucia
    DOBRAS, 2024, (41): : 292 - 315
  • [7] Comparative performance evaluation of Java']Java threads for embedded applications: Linux Thread vs. Green Thread
    Sung, M
    Kim, S
    Park, S
    Chang, N
    Shin, H
    INFORMATION PROCESSING LETTERS, 2002, 84 (04) : 221 - 225
  • [8] Bootstrapping: Using SMT Hardware to Improve Single-Thread Performance
    Kondguli, Sushant
    Huang, Michael
    IEEE COMPUTER ARCHITECTURE LETTERS, 2018, 17 (02) : 205 - 208
  • [9] Bootstrapping: Using SMT Hardware to Improve Single-Thread Performance
    Kondguli, Sushant
    Huang, Michael
    TWENTY-FOURTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXIV), 2019, : 687 - 700
  • [10] Multi-Thread Performance on a Single Thread In-Memory Database
    Lubis, Ramot
    Sagala, Albert
    2015 7th International Conference on Information Technology and Electrical Engineering (ICITEE), 2015, : 571 - 575