Review of Memory RAS for Data Centers

被引:0
|
作者
Lee, Jiseong [1 ]
Kim, Min Joon [1 ]
Kim, Woo-Seop [1 ]
Kim, Yong Sin [1 ]
机构
[1] Korea Univ, Sch Elect Engn, Seoul 02841, South Korea
基金
新加坡国家研究基金会;
关键词
Error correction codes; Data centers; Costs; Servers; Performance evaluation; Memory modules; Maintenance engineering; Reliability engineering; Correctable error (CE); error correction code (ECC); memory reliability; availability; serviceability (RAS); uncorrectable error (UE); ERROR-CORRECTION; CODES; DRAM; RELIABILITY; ECC; RESILIENCE; STORAGE;
D O I
10.1109/ACCESS.2023.3329984
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-bit error and downtime due to uncorrectable error (UE) in a dual in line memory module (DIMM) have received great attention in data centers for its high repair or replacement cost. These problems can be alleviated by utilizing ECC (Error Correction Code) technology, which enables prompt error correction during initial occurrences and prediction of future UEs based on recurring error patterns. The technologies for addressing errors can be categorized into reliability, availability, and serviceability (RAS), and need to be optimized using various parameters such as accuracy, recall, F-measures, and cost reduction. This paper describes an overview of the current RAS technologies and trends in memory for data centers, which includes an analysis of conventional ECC technologies and their recent developments. Once UEs cannot be completely eliminated with ECCs, page offline methods based on analysis on error patterns and characterization of UE can be performed. Recent research trends for reducing memory capacity wasted by UE and page offline have been towards on-die ECC in high bandwidth memory architecture.
引用
下载
收藏
页码:124782 / 124796
页数:15
相关论文
共 50 条
  • [21] A Review on Edge to Cloud: Paradigm Shift from Large Data Centers to Small Centers of Data Everywhere
    Uddin, Mohammed Yousuf
    Ahmad, Sultan
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT-2020), 2020, : 318 - 322
  • [22] A review of power consumption models of servers in data centers
    Jin, Chaoqiang
    Bai, Xuelian
    Yang, Chao
    Mao, Wangxin
    Xu, Xin
    APPLIED ENERGY, 2020, 265
  • [23] The Applications and Challenges of Nanofluids as Coolants in Data Centers: A Review
    Sun, Le
    Geng, Jiafeng
    Dong, Kaijun
    Sun, Qin
    ENERGIES, 2024, 17 (13)
  • [24] A review of air conditioning energy performance in data centers
    Ni, Jiacheng
    Bai, Xuelian
    RENEWABLE & SUSTAINABLE ENERGY REVIEWS, 2017, 67 : 625 - 640
  • [25] Recent advances in optical technologies for data centers: a review
    Cheng, Qixiang
    Bahadori, Meisam
    Glick, Madeleine
    Rumley, Sebastien
    Bergman, Keren
    OPTICA, 2018, 5 (11): : 1354 - 1370
  • [26] A review of energy efficiency evaluation metrics for data centers
    Shao, Xiaotong
    Zhang, Zhongbin
    Song, Ping
    Feng, Yanzhen
    Wang, Xiaolin
    ENERGY AND BUILDINGS, 2022, 271
  • [27] Review of energy management for data centers in energy internet
    Feng C.
    Wang Y.
    Chen Q.
    Zhang X.
    Luo G.
    Dianli Zidonghua Shebei/Electric Power Automation Equipment, 2020, 40 (07): : 1 - 9
  • [28] A Review of Data Centers Energy Consumption and Reliability Modeling
    Ahmed, Kazi Main Uddin
    Bollen, Math H. J.
    Alvarez, Manuel
    IEEE ACCESS, 2021, 9 (09): : 152536 - 152563
  • [29] A review on evaluation metrics of thermal performance in data centers
    Gong, Xiaoming
    Zhang, Zhongbin
    Gan, Sixuan
    Niu, Baolian
    Yang, Liu
    Xu, Haijin
    Gao, Manfang
    BUILDING AND ENVIRONMENT, 2020, 177