Automated data cleaning for data centers: A case study

被引:0
|
作者
Haider, Syed Naeem
Zhao, Qianchuan [1 ]
Meran, Bushra Kainat
机构
[1] Tsinghua Univ, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Data preprocessing; Machine learning; Preprocessing; Data mining;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Preprocessing the raw data is a critical stage in machine learning whose fundamental objective is to prepare a cleaned and error-free data set for data analytic algorithms. Transforming raw data into clean data is a basic requirement in industrial and commercial sectors but there are many challenges which have to be addressed individually and manually. Since there is no unified framework that incorporates all the required fields to transform raw data into clean data, manual transformation is ineffective and very time consuming. We discuss a case study for cleaning data in data center, comparing missing values tilling issue with forecast and mean value replacement for missing values and propose an automated data preprocessing framework for data cleaning. Proposed frame work successfully cleans data sets automatically instead of dealing multiple problems distinctly and manually.
引用
收藏
页码:3227 / 3232
页数:6
相关论文
共 50 条
  • [1] Biological data cleaning: A case study
    Herbert, Katherine G.
    Wang, Jason T.L.
    [J]. International Journal of Information Quality, 2007, 1 (01) : 60 - 82
  • [2] An automated magnetoencephalographic data cleaning algorithm
    Sorriso, Antonietta
    Sorrentino, Pierpaolo
    Rucco, Rosaria
    Mandolesi, Laura
    Ferraioli, Giampaolo
    Franceschini, Stefano
    Ambrosanio, Michele
    Baselice, Fabio
    [J]. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING, 2019, 22 (14) : 1116 - 1125
  • [3] Case study of data centers' energy performance
    Sun, HS
    Lee, SE
    [J]. ENERGY AND BUILDINGS, 2006, 38 (05) : 522 - 533
  • [4] Automated Synthesis of Sustainable Data Centers
    Christian, Tom
    Chen, Yuan
    Shih, Rocky
    Sharma, Ratnesh
    Hoover, Christopher
    Marwah, Manish
    Shah, Amip
    Gmach, Daniel
    [J]. 2009 IEEE INTERNATIONAL SYMPOSIUM ON SUSTAINABLE SYSTEMS AND TECHNOLOGY, 2009, : 222 - 227
  • [5] Automated multibeam data cleaning and target detection
    Lirakis, CB
    Bongiovanni, KP
    [J]. OCEANS 2000 MTS/IEEE - WHERE MARINE SCIENCE AND TECHNOLOGY MEET, VOLS 1-3, CONFERENCE PROCEEDINGS, 2000, : 719 - 723
  • [6] The Case for Sustainability in Data Centers
    Tozer, Robert
    Flucker, Sophia
    Whitehead, Beth
    [J]. ASHRAE TRANSACTIONS, VOL 122, PT 1, 2016, 122 : 99 - 110
  • [7] Towards an automated data cleaning with deep learning in CRESST
    G. Angloher
    S. Banik
    D. Bartolot
    G. Benato
    A. Bento
    A. Bertolini
    R. Breier
    C. Bucci
    J. Burkhart
    L. Canonica
    A. D’Addabbo
    S. Di Lorenzo
    L. Einfalt
    A. Erb
    F. v. Feilitzsch
    N. Ferreiro Iachellini
    S. Fichtinger
    D. Fuchs
    A. Fuss
    A. Garai
    V. M. Ghete
    S. Gerster
    P. Gorla
    P. V. Guillaumon
    S. Gupta
    D. Hauff
    M. Ješkovský
    J. Jochum
    M. Kaznacheeva
    A. Kinast
    H. Kluck
    H. Kraus
    M. Lackner
    A. Langenkämper
    M. Mancuso
    L. Marini
    L. Meyer
    V. Mokina
    A. Nilima
    M. Olmi
    T. Ortmann
    C. Pagliarone
    L. Pattavina
    F. Petricca
    W. Potzel
    P. Povinec
    F. Pröbst
    F. Pucci
    F. Reindl
    D. Rizvanovic
    [J]. The European Physical Journal Plus, 138
  • [8] Towards an automated data cleaning with deep learning in CRESST
    Angloher, G.
    Banik, S.
    Bartolot, D.
    Benato, G.
    Bento, A.
    Bertolini, A.
    Breier, R.
    Bucci, C.
    Burkhart, J.
    Canonica, L.
    D'Addabbo, A.
    Di Lorenzo, S.
    Einfalt, L.
    Erb, A.
    Feilitzsch, F. V.
    Iachellini, N. Ferreiro
    Fichtinger, S.
    Fuchs, D.
    Fuss, A.
    Garai, A.
    Ghete, V. M.
    Gerster, S.
    Gorla, P.
    Guillaumon, P. V.
    Gupta, S.
    Hauff, D.
    Jeskovsky, M.
    Jochum, J.
    Kaznacheeva, M.
    Kinast, A.
    Kluck, H.
    Kraus, H.
    Lackner, M.
    Langenkaemper, A.
    Mancuso, M.
    Marini, L.
    Meyer, L.
    Mokina, V.
    Nilima, A.
    Olmi, M.
    Ortmann, T.
    Pagliarone, C.
    Pattavina, L.
    Petricca, F.
    Potzel, W.
    Povinec, P.
    Proebst, F.
    Pucci, F.
    Reindl, F.
    Rizvanovic, D.
    [J]. EUROPEAN PHYSICAL JOURNAL PLUS, 2023, 138 (01):
  • [9] Automated Data Consistency Checking Using SBVR Case Study : Academic Data in a University
    Natali, Vania
    Liem, Inggriani
    [J]. 2015 INTERNATIONAL CONFERENCE ON DATA AND SOFTWARE ENGINEERING (ICODSE), 2015, : 54 - 59
  • [10] eTransform: Transforming Enterprise Data Centers by Automated Consolidation
    Singh, Rahul
    Shenoy, Prashant
    Ramakrishnan, K. K.
    Kelkar, Rahul
    Vin, Harrick
    [J]. 2012 IEEE 32ND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2012, : 1 - 11