BPM/BPM plus : Software-Based Dynamic Memory Partitioning Mechanisms for Mitigating DRAM Bank-/Channel-Level Interferences in Multicore Systems

被引:11
|
作者
Liu, Lei [1 ,3 ]
Cui, Zehan [1 ,3 ,4 ]
Li, Yong [2 ]
Bao, Yungang [1 ,3 ]
Chen, Mingyu [1 ,3 ]
Wu, Chengyong [1 ,3 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Beijing 100864, Peoples R China
[2] Univ Pittsburgh, Dept ECE, Pittsburgh, PA 15260 USA
[3] Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing 100864, Peoples R China
[4] Chinese Acad Sci, Grad Sch, Beijing 100864, Peoples R China
基金
国家高技术研究发展计划(863计划); 中国国家自然科学基金;
关键词
Management; Performance; Design; Main memory; multicore; interference; memory scheduling;
D O I
10.1145/2579672
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The main memory system is a shared resource in modern multicore machines that can result in serious interference leading to reduced throughput and unfairness. Many new memory scheduling mechanisms have been proposed to address the interference problem. However, these mechanisms usually employ relative complex scheduling logic and need modifications to Memory Controllers (MCs), which incur expensive hardware design and manufacturing overheads. This article presents a practical software approach to effectively eliminate the interference without any hardware modifications. The key idea is to modify the OS memory management system and adopt a page-coloring-based Bank-level Partitioning Mechanism (BPM) that allocates dedicated DRAM banks to each core (or thread). By using BPM, memory requests from distinct programs are segregated across multiple memory banks to promote locality/fairness and reduce interference. We further extend BPM to BPM+ by incorporating channel-level partitioning, on which we demonstrate additional gain over BPM in many cases. To achieve benefits in the presence of diverse application memory needs and avoid performance degradation due to resource underutilization, we propose a dynamic mechanism upon BPM/BPM+ that assigns appropriate bank/channel resources based on application memory/bandwidth demands monitored through PMU (performance-monitoring unit) and a low-overhead OS page table scanning process. We implement BPM/BPM+ in Linux 2.6.32.15 kernel and evaluate the technique on four-core and eight-core real machines by running a large amount of randomly generated multiprogrammed and multithreaded workloads. Experimental results show that BPM/BPM+ can improve the overall system throughput by 4.7%/5.9%, on average, (up to 8.6%/9.5%) and reduce the unfairness by an average of 4.2%/6.1% (up to 15.8%/13.9%).
引用
收藏
页数:28
相关论文
empty
未找到相关数据