Review of Memory RAS for Data Centers

被引:0
|
作者
Lee, Jiseong [1 ]
Kim, Min Joon [1 ]
Kim, Woo-Seop [1 ]
Kim, Yong Sin [1 ]
机构
[1] Korea Univ, Sch Elect Engn, Seoul 02841, South Korea
基金
新加坡国家研究基金会;
关键词
Error correction codes; Data centers; Costs; Servers; Performance evaluation; Memory modules; Maintenance engineering; Reliability engineering; Correctable error (CE); error correction code (ECC); memory reliability; availability; serviceability (RAS); uncorrectable error (UE); ERROR-CORRECTION; CODES; DRAM; RELIABILITY; ECC; RESILIENCE; STORAGE;
D O I
10.1109/ACCESS.2023.3329984
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-bit error and downtime due to uncorrectable error (UE) in a dual in line memory module (DIMM) have received great attention in data centers for its high repair or replacement cost. These problems can be alleviated by utilizing ECC (Error Correction Code) technology, which enables prompt error correction during initial occurrences and prediction of future UEs based on recurring error patterns. The technologies for addressing errors can be categorized into reliability, availability, and serviceability (RAS), and need to be optimized using various parameters such as accuracy, recall, F-measures, and cost reduction. This paper describes an overview of the current RAS technologies and trends in memory for data centers, which includes an analysis of conventional ECC technologies and their recent developments. Once UEs cannot be completely eliminated with ECCs, page offline methods based on analysis on error patterns and characterization of UE can be performed. Recent research trends for reducing memory capacity wasted by UE and page offline have been towards on-die ECC in high bandwidth memory architecture.
引用
收藏
页码:124782 / 124796
页数:15
相关论文
共 50 条
  • [1] Optically Connected Memory for Disaggregated Data Centers
    Gonzalez, Jorge
    Gazman, Alexander
    Hattink, Maarten
    Palma, Mauricio G.
    Bahadori, Meisam
    Rubio-Noriega, Ruth
    Orosa, Lois
    Glick, Madeleine
    Mutlu, Onur
    Bergman, Keren
    Azevedo, Rodolfo
    [J]. 2020 IEEE 32ND INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2020), 2020, : 43 - 50
  • [2] Optically connected memory for disaggregated data centers
    Gonzalez, Jorge
    Palma, Mauricio G.
    Hattink, Maarten
    Rubio-Noriega, Ruth
    Orosa, Lois
    Mutlu, Onur
    Bergman, Keren
    Azevedo, Rodolfo
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2022, 163 : 300 - 312
  • [3] Building Data Centers With Optically Connected Memory
    Brunina, Daniel
    Lai, Caroline P.
    Garg, Ajay S.
    Bergman, Keren
    [J]. JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING, 2011, 3 (08) : A40 - A48
  • [4] NFV Data Centers: A Systematic Review
    Souza, Rafael
    Dias, Kelvin
    Fernandes, Stenio
    [J]. IEEE ACCESS, 2020, 8 : 51713 - 51735
  • [5] A review on airflow management in data centers
    Chu, Wen-Xiao
    Wang, Chi-Chuan
    [J]. APPLIED ENERGY, 2019, 240 : 84 - 119
  • [6] Free cooling of data centers: A review
    Zhang, Hainan
    Shao, Shuangquan
    Xu, Hongbo
    Zou, Huiming
    Tian, Changqing
    [J]. RENEWABLE & SUSTAINABLE ENERGY REVIEWS, 2014, 35 : 171 - 182
  • [7] Contiguitas: The Pursuit of Physical Memory Contiguity in Data Centers
    Zhao, Kaiyang
    Xue, Kaiwen
    Wang, Ziqi
    Schatzberg, Dan
    Yang, Leon
    Manousis, Antonis
    Weiner, Johannes
    Riel, Rik Van
    Sharma, Bikash
    Tang, Chunqiang
    Skarlatos, Dimitrios
    [J]. IEEE MICRO, 2024, 44 (04) : 44 - 51
  • [8] Adaptive memory load management in cloud data centers
    Wu, H.
    Tantawi, A. N.
    Diao, Y.
    Wang, W.
    [J]. IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2011, 55 (06)
  • [9] Towards greener data centers with storage class memory
    Doh, In Hwan
    Kim, Young Jin
    Kim, Eunsam
    Choi, Jongmoo
    Lee, Donghee
    Noh, Sam H.
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2013, 29 (08): : 1969 - 1980
  • [10] Memory sharing for handling memory overload on physical machines in cloud data centers
    Yaozhong Ge
    Yu-Chu Tian
    Zu-Guo Yu
    Weizhe Zhang
    [J]. Journal of Cloud Computing, 12