A Multi-purpose Data Pre-processing Framework using Machine Learning for Enterprise Data Models

被引:0
|
作者
Ramana, Venkata B. [1 ]
Narsimha, G. [1 ]
机构
[1] JNTU, Dept Comp Sci & Engn, Hyderabad, India
关键词
Standard domain length; domain specific rule engine; double differential clustering; change percentage; dependency map;
D O I
10.14569/IJACSA.2021.0120376
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Growth in the data processing industry has automated decision making for various domains such as engineering, education and also many fields of research. The increased growth has also accelerated higher dependencies on the data driven business decisions on enterprise scale data models. The accuracy of such decisions solely depends on correctness of the data. In the recent past, a good number of data cleaning methods are projected by various research attempts. Nonetheless, most of these outcomes are criticized for higher generalness or higher specificness. Thus, the demand for multi-purpose, however domain specific, framework for enterprise scale data pre-processing is in demand in the recent time. Hence, this work proposes a novel framework for data cleaning method as missing value identification using the standard domain length with significantly reduced time complexity, domain specific outlier identification using customizable rule engine, detailed generic outlier reduction using double differential clustering and finally dimensionality reduction using the change percentage dependency mapping. The outcome from this framework is significantly impressive as the outliers and missing treatment showcases nearly 99% accuracy over benchmarked dataset.
引用
收藏
页码:646 / 656
页数:11
相关论文
共 50 条
  • [21] Pre-processing of the speech data
    不详
    ROBUST ADAPTATION TO NON-NATIVE ACCENTS IN AUTOMATIC SPEECH RECOGNITION, 2002, 2560 : 15 - 19
  • [22] Beyond the Model: Data Pre-processing Attack to Deep Learning Models in Android Apps
    Sang, Ye
    Huang, Yujin
    Huang, Shuo
    Cui, Helei
    PROCEEDINGS OF THE INAUGURAL ASIACCS 2023 WORKSHOP ON SECURE AND TRUSTWORTHY DEEP LEARNING SYSTEMS, SECTL, 2022,
  • [23] Proposal of Data Pre-processing for Purpose of Analysis in Accordance with the Concept Industry 4.0
    Grigelova, Veronika
    Abasova, Jela
    Tanuska, Pavol
    ARTIFICIAL INTELLIGENCE METHODS IN INTELLIGENT ALGORITHMS, 2019, 985 : 324 - 331
  • [24] A data pre-processing method based on multi-threshold
    Su-bida
    Wang-shuhua
    Wang-Jingfeng
    Zhong-Hua
    Deng-Rong
    Hua-Hao
    Yang-suhui
    INTERNATIONAL SYMPOSIUM ON OPTOELECTRONIC TECHNOLOGY AND APPLICATION 2014: OPTICAL REMOTE SENSING TECHNOLOGY AND APPLICATIONS, 2014, 9299
  • [25] The importance of signal pre-processing for machine learning: The influence of Data scaling in a driver identity classification
    Abdennour, Najmeddine
    Ouni, Tarek
    Ben Amor, Nader
    2021 IEEE/ACS 18TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2021,
  • [26] Correcting replicate variation in spectroscopic data by machine learning and model-based pre-processing
    Tafintseva, Valeria
    Shapaval, Volha
    Blazhko, Uladzislau
    Kohler, Achim
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2021, 215
  • [27] Pre-Processing Flow for Enhancing Learning from Medical Data
    Muresan, Sebastian
    Faloba, Ioana
    Lemnaru, Camelia
    Potolea, Rodica
    2015 IEEE 11TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING (ICCP), 2015, : 27 - 34
  • [28] Automated Data Pre-processing via Meta-learning
    Bilalli, Besim
    Abello, Alberto
    Aluja-Banet, Tomas
    Wrembel, Robert
    MODEL AND DATA ENGINEERING, 2016, 9893 : 194 - 208
  • [29] tf.data: A Machine Learning Data Processing Framework
    Murray, Derek G.
    Simsa, Jiri
    Klimovic, Ana
    Indyk, Ihor
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 14 (12): : 2945 - 2958
  • [30] Efficient Dengue Spread Prediction Using Machine Learning Models with Various Pre-processing Techniques
    Saraswathi, K.
    Rohini, K.
    2024 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND APPLIED INFORMATICS, ACCAI 2024, 2024,