A Multi-purpose Data Pre-processing Framework using Machine Learning for Enterprise Data Models

被引:0
|
作者
Ramana, Venkata B. [1 ]
Narsimha, G. [1 ]
机构
[1] JNTU, Dept Comp Sci & Engn, Hyderabad, India
关键词
Standard domain length; domain specific rule engine; double differential clustering; change percentage; dependency map;
D O I
10.14569/IJACSA.2021.0120376
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Growth in the data processing industry has automated decision making for various domains such as engineering, education and also many fields of research. The increased growth has also accelerated higher dependencies on the data driven business decisions on enterprise scale data models. The accuracy of such decisions solely depends on correctness of the data. In the recent past, a good number of data cleaning methods are projected by various research attempts. Nonetheless, most of these outcomes are criticized for higher generalness or higher specificness. Thus, the demand for multi-purpose, however domain specific, framework for enterprise scale data pre-processing is in demand in the recent time. Hence, this work proposes a novel framework for data cleaning method as missing value identification using the standard domain length with significantly reduced time complexity, domain specific outlier identification using customizable rule engine, detailed generic outlier reduction using double differential clustering and finally dimensionality reduction using the change percentage dependency mapping. The outcome from this framework is significantly impressive as the outliers and missing treatment showcases nearly 99% accuracy over benchmarked dataset.
引用
收藏
页码:646 / 656
页数:11
相关论文
共 50 条
  • [31] Coupled data pre-processing approach with data intelligence models for monthly precipitation forecasting
    Nikpour, M. R.
    Abdollahi, S.
    Sanikhani, H.
    Raeisi, J.
    Yaseen, Z. M.
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL SCIENCE AND TECHNOLOGY, 2022, 19 (12) : 11919 - 11934
  • [32] Importance of Data Pre-processing in Credit Scoring Models Based on Data Mining Approaches
    Nalic, Jasmina
    Svraka, Amar
    2018 41ST INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2018, : 1046 - 1051
  • [33] Coupled data pre-processing approach with data intelligence models for monthly precipitation forecasting
    M. R. Nikpour
    S. Abdollahi
    H. Sanikhani
    J. Raeisi
    Z. M. Yaseen
    International Journal of Environmental Science and Technology, 2022, 19 : 11919 - 11934
  • [34] On Pre-processing Algorithms for Data Stream
    Duda, Piotr
    Jaworski, Maciej
    Pietruczuk, Lena
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT II, 2012, 7268 : 56 - 63
  • [35] Visually Exploring Multi-Purpose Audio Data
    Heise, David
    Bear, Helen L.
    IEEE MMSP 2021: 2021 IEEE 23RD INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2021,
  • [36] Kurtosis removal for data pre-processing
    Loperfido, Nicola
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2023, 17 (01) : 239 - 267
  • [37] Intelligent assistance for data pre-processing
    Bilalli, Besim
    Abello, Alberto
    Aluja-Banet, Tomas
    Wrembel, Robert
    COMPUTER STANDARDS & INTERFACES, 2018, 57 : 101 - 109
  • [38] A NEW METHOD FOR DATA PRE-PROCESSING
    RAISINGHANI, SC
    BILIMORIA, KD
    JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 1984, 7 (02) : 255 - 256
  • [39] Kurtosis removal for data pre-processing
    Nicola Loperfido
    Advances in Data Analysis and Classification, 2023, 17 : 239 - 267
  • [40] Pre-processing VDIF Data in FPGA
    Gan, Jiangying
    Xu, Zhijun
    2018 PROGRESS IN ELECTROMAGNETICS RESEARCH SYMPOSIUM (PIERS-TOYAMA), 2018, : 723 - 728