Time and Space-Efficient Write Parallelism in PCM by Exploiting Data Patterns

被引:5
|
作者
Li, Zheng [1 ]
Wang, Fang [1 ]
Feng, Dan [1 ]
Hua, Yu [1 ]
Liu, Jingning [1 ]
Tong, Wei [1 ]
Chen, Yu [1 ]
Harb, Salah S. [1 ]
机构
[1] Huazhong Univ Sci & Technol, Wuhan Natl Lab Optoelect, Key Lab Informat Storage Syst, Sch Comp Sci & Technol,Minist Educ China, Sheng 430074, Hubei, Peoples R China
关键词
PCM; write unit; performance evaluation; write energy; PHASE-CHANGE MEMORY;
D O I
10.1109/TC.2017.2677903
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The size of write unit in PCM, namely the number of bits allowed to be written concurrently at one time, is restricted due to high write energy consumption. It typically needs several serially executed write units to finish a cache line service when using PCM as the main memory, which results in long write latency and high energy consumption. To address the poor write performance problem, we propose a novel PCM write scheme called Min-WU (Minimize the number of Write Units). We observe data access locality that some frequent zero-extended values dominate the write data patterns in typical multi-threaded applications (more than 40 and 44.9 percent of all memory accesses in PARSEC workloads and SPEC 2006 benchmarks, respectively). By leveraging carefully designed chip-level data redistribution method, the data amount is balanced and the data pattern is the same among all PCM chips. The key idea behind Min-WU is to minimize the number of serially executed write units in a cache line service after data redistribution through sFPC (simplified Frequent Pattern Compression), eRW (efficient Reordering Write operations method) and fWP (fine-tuned Write Parallelism circuits). Using Min-WU, the zero parts of write units can be indicated with predefined prefixes and the residues can be reordered and written simultaneously under power constraints. Our design can improve the performance, energy consumption and endurance of PCM-based main memory with low space and time overhead. Experimental results of 12 multi-threaded PARSEC 2.0 workloads show that Min-WU reduces 44 percent read latency, 28 percent write latency, 32.5 percent running time and 48 percent energy while receiving 32 percent IPC improvement compared with the conventional write scheme with few memory cycles and less than 3 percent storage space overhead. Evaluation results of 8 SPEC 2006 benchmarks demonstrate that Min-WU earns 57.8/46.0 percent read/write latency reduction, 28.7 percent IPC improvement, 28 percent running time reduction and 62.1 percent energy reduction compared with the baseline under realistic memory hierarchy configurations.
引用
收藏
页码:1629 / 1644
页数:16
相关论文
共 50 条
  • [1] Exploiting More Parallelism from Write Operations on PCM
    Li, Zheng
    Wang, Fang
    Hua, Yu
    Tong, Wei
    Liu, Jingning
    Chen, Yu
    Feng, Dan
    [J]. PROCEEDINGS OF THE 2016 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2016, : 768 - 773
  • [2] Space-efficient implementation of nested parallelism
    Narlikar, GJ
    Blelloch, GE
    [J]. ACM SIGPLAN NOTICES, 1997, 32 (07) : 25 - 36
  • [3] Space-efficient scheduling of nested parallelism
    Narlikar, GJ
    Blelloch, GE
    [J]. ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 1999, 21 (01): : 138 - 173
  • [4] A space-efficient Huffman decoding algorithm and its parallelism
    Lin, YK
    Chung, KL
    [J]. THEORETICAL COMPUTER SCIENCE, 2000, 246 (1-2) : 227 - 238
  • [5] Exploiting efficient parallelism for mining rules in time series data
    Sarker, BK
    Uehara, K
    Yang, LT
    [J]. HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, PROCEEDINGS, 2005, 3726 : 845 - 855
  • [6] Exploiting Data Parallelism for Efficient Classification of Multi-Dimensional Patterns
    Cyganek, Boguslaw
    Wozniak, Michal
    [J]. 2016 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE & COMPUTATIONAL INTELLIGENCE (CSCI), 2016, : 561 - 566
  • [7] TIME-EFFICIENT AND SPACE-EFFICIENT RANDOMIZED CONSENSUS
    ASPNES, J
    [J]. JOURNAL OF ALGORITHMS, 1993, 14 (03) : 414 - 431
  • [8] Space-efficient first race detection in shared memory programs with nested parallelism
    Ha, KS
    Ryu, EK
    Yoo, KY
    [J]. APPLIED PARALLEL COMPUTING: ADVANCED SCIENTIFIC COMPUTING, 2002, 2367 : 253 - 263
  • [9] Space-efficient first race detection in shared memory programs with nested parallelism
    Ha, KS
    Ryu, EK
    Yoo, KY
    [J]. APPLIED PARALLEL COMPUTING: ADVANCED SCIENTIFIC COMPUTING, 2002, 2367 : 253 - 263
  • [10] Space-efficient data cubes for dynamic environments
    Riedewald, M
    Agrawal, D
    El Abbadi, A
    Pajarola, R
    [J]. DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2000, 1874 : 24 - 33