A Multi-purpose Data Pre-processing Framework using Machine Learning for Enterprise Data Models

被引:0
|
作者
Ramana, Venkata B. [1 ]
Narsimha, G. [1 ]
机构
[1] JNTU, Dept Comp Sci & Engn, Hyderabad, India
关键词
Standard domain length; domain specific rule engine; double differential clustering; change percentage; dependency map;
D O I
10.14569/IJACSA.2021.0120376
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Growth in the data processing industry has automated decision making for various domains such as engineering, education and also many fields of research. The increased growth has also accelerated higher dependencies on the data driven business decisions on enterprise scale data models. The accuracy of such decisions solely depends on correctness of the data. In the recent past, a good number of data cleaning methods are projected by various research attempts. Nonetheless, most of these outcomes are criticized for higher generalness or higher specificness. Thus, the demand for multi-purpose, however domain specific, framework for enterprise scale data pre-processing is in demand in the recent time. Hence, this work proposes a novel framework for data cleaning method as missing value identification using the standard domain length with significantly reduced time complexity, domain specific outlier identification using customizable rule engine, detailed generic outlier reduction using double differential clustering and finally dimensionality reduction using the change percentage dependency mapping. The outcome from this framework is significantly impressive as the outliers and missing treatment showcases nearly 99% accuracy over benchmarked dataset.
引用
收藏
页码:646 / 656
页数:11
相关论文
共 50 条
  • [41] Pre-Processing Methods of Data Mining
    Saleem, Asma
    Asif, Khadim Hussain
    Ali, Ahmad
    Awan, Shahid Mahmood
    AlGhamdi, Mohammed A.
    2014 IEEE/ACM 7TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2014, : 451 - 456
  • [42] PRE-PROCESSING OF DATA FOR CHARACTER RECOGNITION
    ALCORN, TM
    HOGGAR, CW
    MARCONI REVIEW, 1969, 32 (172): : 61 - &
  • [43] Event Transformer+. A Multi-Purpose Solution for Efficient Event Data Processing
    Sabater, Alberto
    Montesano, Luis
    Murillo, Ana C.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 16013 - 16020
  • [44] Pre-processing Agilent microarray data
    Zahurak, Marianna
    Parmigiani, Giovanni
    Yu, Wayne
    Scharpf, Robert B.
    Berman, David
    Schaeffer, Edward
    Shabbeer, Shabana
    Cope, Leslie
    BMC BIOINFORMATICS, 2007, 8 (1)
  • [45] Application of hybrid machine learning models and data pre-processing to predict water level of watersheds: Recent trends and future perspective
    Mohammed, Sarah J.
    Zubaidi, Salah L.
    Ortega-Martorell, Sandra
    Al-Ansari, Nadhir
    Ethaib, Saleem
    Hashim, Khalid
    COGENT ENGINEERING, 2022, 9 (01):
  • [46] Pre-processing Agilent microarray data
    Marianna Zahurak
    Giovanni Parmigiani
    Wayne Yu
    Robert B Scharpf
    David Berman
    Edward Schaeffer
    Shabana Shabbeer
    Leslie Cope
    BMC Bioinformatics, 8
  • [47] PRESISTANT: Data Pre-processing Assistant
    Bilalli, Besim
    Abello, Alberto
    Aluja-Banet, Tomas
    Munir, Rana Faisal
    Wrembel, Robert
    INFORMATION SYSTEMS IN THE BIG DATA ERA, 2018, 317 : 57 - 65
  • [48] Machine learning in medicine: a practical introduction to techniques for data pre-processing, hyperparameter tuning, and model comparison
    André Pfob
    Sheng-Chieh Lu
    Chris Sidey-Gibbons
    BMC Medical Research Methodology, 22
  • [49] Current breathomics-a review on data pre-processing techniques and machine learning in metabolomics breath analysis
    Smolinska, A.
    Hauschild, A-Ch
    Fijten, R. R. R.
    Dallinga, J. W.
    Baumbach, J.
    van Schooten, F. J.
    JOURNAL OF BREATH RESEARCH, 2014, 8 (02)
  • [50] Neural Pre-processing: A Learning Framework for End-to-End Brain MRI Pre-processing
    He, Xinzi
    Wang, Alan Q.
    Sabuncu, Mert R.
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VIII, 2023, 14227 : 258 - 267