A Multi-purpose Data Pre-processing Framework using Machine Learning for Enterprise Data Models

被引:0
|
作者
Ramana, Venkata B. [1 ]
Narsimha, G. [1 ]
机构
[1] JNTU, Dept Comp Sci & Engn, Hyderabad, India
关键词
Standard domain length; domain specific rule engine; double differential clustering; change percentage; dependency map;
D O I
10.14569/IJACSA.2021.0120376
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Growth in the data processing industry has automated decision making for various domains such as engineering, education and also many fields of research. The increased growth has also accelerated higher dependencies on the data driven business decisions on enterprise scale data models. The accuracy of such decisions solely depends on correctness of the data. In the recent past, a good number of data cleaning methods are projected by various research attempts. Nonetheless, most of these outcomes are criticized for higher generalness or higher specificness. Thus, the demand for multi-purpose, however domain specific, framework for enterprise scale data pre-processing is in demand in the recent time. Hence, this work proposes a novel framework for data cleaning method as missing value identification using the standard domain length with significantly reduced time complexity, domain specific outlier identification using customizable rule engine, detailed generic outlier reduction using double differential clustering and finally dimensionality reduction using the change percentage dependency mapping. The outcome from this framework is significantly impressive as the outliers and missing treatment showcases nearly 99% accuracy over benchmarked dataset.
引用
收藏
页码:646 / 656
页数:11
相关论文
共 50 条
  • [11] A New Data Analytics Framework Emphasising Pre-processing in Learning AI Models for Complex Manufacturing Systems
    Carbery, Caoimhe M.
    Woods, Roger
    Marshall, Adele H.
    INTELLIGENT COMPUTING AND INTERNET OF THINGS, PT II, 2018, 924 : 169 - 179
  • [12] Comparative Study of Machine Learning Techniques for Pre-processing of Network Intrusion Data
    Rahat, Faiza
    Ahsan, Syed Nadeem
    2015 INTERNATIONAL CONFERENCE ON OPEN SOURCE SYSTEMS & TECHNOLOGIES (ICOSST), 2015, : 46 - 51
  • [13] Package Proposal for Data Pre-Processing for Machine Learning Applied to Precision Irrigation
    dos Santos, Rogerio Pereira
    Beko, Marko
    Leithardt, Valderi R. Q.
    2023 6TH CONFERENCE ON CLOUD AND INTERNET OF THINGS, CIOT, 2023, : 141 - 148
  • [14] Review of Data Pre-processing Techniques and Machine Learning in PTR-MS
    Sun Y.
    Chen Y.-B.
    Chu M.-J.
    Jiang X.-H.
    Wang Y.
    Guo B.-Q.
    2018, Chinese Society for Mass Spectrometry (39) : 513 - 523
  • [15] Human Multi-omics Data Pre-processing for Predictive Purposes Using Machine Learning: A Case Study in Childhood Obesity
    Torres-Martos, Alvaro
    Anguita-Ruiz, Augusto
    Bustos-Aibar, Mireia
    Camara-Sanchez, Sofia
    Alcala, Rafael
    Aguilera, Concepcion M.
    Alcala-Fdez, Jesus
    BIOINFORMATICS AND BIOMEDICAL ENGINEERING, PT II, 2022, : 359 - 374
  • [16] An Enhanced Pre-Processing Model for Big Data Processing: A Quality Framework
    Lincy, Blessy Trencia S. S.
    Kumar, N. Suresh
    2017 IEEE INTERNATIONAL CONFERENCE ON INNOVATIONS IN GREEN ENERGY AND HEALTHCARE TECHNOLOGIES (IGEHT), 2017,
  • [17] PRESISTANT: Learning based assistant for data pre-processing
    Bilalli, Besim
    Abello, Alberto
    Aluja-Banet, Tomas
    Wrembel, Robert
    DATA & KNOWLEDGE ENGINEERING, 2019, 123
  • [18] Semantic Data Pre-Processing for Machine Learning Based Bankruptcy Prediction Computational Model
    Yerashenia, Natalia
    Bolotov, Alexander
    Chan, David
    Pierantoni, Gabriele
    2020 IEEE 22ND CONFERENCE ON BUSINESS INFORMATICS (CBI 2020), VOL I - RESEARCH PAPERS, 2020, : 66 - 75
  • [19] HeuristicModeler: A Multi-Purpose Evolutionary Machine Learning Algorithm and its Applications in Medical Data Analysis
    Winkler, Stephan
    Affenzeller, Michael
    Wagner, Stefan
    INTERNATIONAL MEDITERRANEAN MODELLING MULTICONFERENCE 2006, 2006, : 629 - 634
  • [20] Pre-processing for data clustering
    Frigui, H
    NAFIPS 2004: ANNUAL MEETING OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY, VOLS 1AND 2: FUZZY SETS IN THE HEART OF THE CANADIAN ROCKIES, 2004, : 967 - 972