A Multi-purpose Data Pre-processing Framework using Machine Learning for Enterprise Data Models

被引:0
|
作者
Ramana, Venkata B. [1 ]
Narsimha, G. [1 ]
机构
[1] JNTU, Dept Comp Sci & Engn, Hyderabad, India
关键词
Standard domain length; domain specific rule engine; double differential clustering; change percentage; dependency map;
D O I
10.14569/IJACSA.2021.0120376
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Growth in the data processing industry has automated decision making for various domains such as engineering, education and also many fields of research. The increased growth has also accelerated higher dependencies on the data driven business decisions on enterprise scale data models. The accuracy of such decisions solely depends on correctness of the data. In the recent past, a good number of data cleaning methods are projected by various research attempts. Nonetheless, most of these outcomes are criticized for higher generalness or higher specificness. Thus, the demand for multi-purpose, however domain specific, framework for enterprise scale data pre-processing is in demand in the recent time. Hence, this work proposes a novel framework for data cleaning method as missing value identification using the standard domain length with significantly reduced time complexity, domain specific outlier identification using customizable rule engine, detailed generic outlier reduction using double differential clustering and finally dimensionality reduction using the change percentage dependency mapping. The outcome from this framework is significantly impressive as the outliers and missing treatment showcases nearly 99% accuracy over benchmarked dataset.
引用
收藏
页码:646 / 656
页数:11
相关论文
共 50 条
  • [1] An Automated Framework for Enterprise Financial Data Pre-processing and Secure Storage
    Alamanda, Sirisha
    Pabboju, Suresh
    Narasimha, G.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (07) : 802 - 812
  • [2] Classification of glucose-level in deionized water using machine learning models and data pre-processing technique
    Tri Ngo Quang
    Tung Nguyen Thanh
    Duc Le Anh
    Huong Pham Thi Viet
    Doanh Sai Cong
    PLOS ONE, 2024, 19 (12):
  • [3] An evaluation of various data pre-processing techniques with machine learning models for water level prediction
    Ervin Shan Khai Tiu
    Yuk Feng Huang
    Jing Lin Ng
    Nouar AlDahoul
    Ali Najah Ahmed
    Ahmed Elshafie
    Natural Hazards, 2022, 110 : 121 - 153
  • [4] An evaluation of various data pre-processing techniques with machine learning models for water level prediction
    Tiu, Ervin Shan Khai
    Huang, Yuk Feng
    Ng, Jing Lin
    AlDahoul, Nouar
    Ahmed, Ali Najah
    Elshafie, Ahmed
    NATURAL HAZARDS, 2022, 110 (01) : 121 - 153
  • [5] Optimizing Machine Learning Data Pre-Processing for Financial Fraud Detection
    Bower, Matthew
    Godasu, Rajesh
    Nyakundi, Nicholas
    Reynolds, Shawn
    2024 IEEE INTERNATIONAL CONFERENCE ON ELECTRO INFORMATION TECHNOLOGY, EIT 2024, 2024, : 28 - 37
  • [6] DATA PRE-PROCESSING APPROACHES IN PREDICTIVE MACHINE LEARNING OBSERVATIONAL STUDIES
    Friedman, H. S.
    Navaratnam, P.
    Kakehi, S.
    Ray, S.
    Hill, N.
    Kim, I
    Gricar, J.
    VALUE IN HEALTH, 2023, 26 (06) : S284 - S284
  • [7] Big Data Pre-Processing: A Quality Framework
    Taleb, Ikbal
    Dssouli, Rachida
    Serhani, Mohamed Adel
    2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, : 191 - 198
  • [8] A framework of irregularity enlightenment for data pre-processing in data mining
    Au, Siu-Tong
    Duan, Rong
    Hesar, Siamak G.
    Jiang, Wei
    ANNALS OF OPERATIONS RESEARCH, 2010, 174 (01) : 47 - 66
  • [9] A framework of irregularity enlightenment for data pre-processing in data mining
    Siu-Tong Au
    Rong Duan
    Siamak G. Hesar
    Wei Jiang
    Annals of Operations Research, 2010, 174 : 47 - 66
  • [10] Pre-Processing Data In Weather Monitoring Application By Using Big Data Quality Framework
    Labeeb, Kashshaf
    Chowdhury, Kuraish Bin Quader
    Riha, Rabea Basri
    Abedin, Mohammad Zoynul
    Yesmin, Sarmila
    Khan, Mohammad Nasfikur Rahman
    PROCEEDINGS OF 2020 6TH IEEE INTERNATIONAL WOMEN IN ENGINEERING (WIE) CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (WIECON-ECE 2020), 2020, : 292 - 295