A novel data repairing approach based on constraints and ensemble learning

被引:4
|
作者
Ataeyan, Mahdieh [1 ]
Daneshpour, Negin [1 ]
机构
[1] Shahid Rajaee Teacher Training Univ, Fac Comp Engn, Tehran, Iran
关键词
Data repairing; Noise detection; Functional dependency; Ensemble learning;
D O I
10.1016/j.eswa.2020.113511
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data repairing is an important task in data mining. This paper proposes a novel data repairing approach based on a combination of constraints and ensemble learning. At first, functional dependencies (FDs) are used as constraints to identify inconsistent records. For each FD, all repeated values in the correct records are discovered. After that, noisy attributes in erroneous records are detected using correct records and the repeated values. To correct the detected noises, a supervised ensemble learning model is constructed for each attribute. The ensemble model consists of a Bayes classifier, a decision tree, and a MultiLayer Perceptron (MLP). A majority of votes is used as the combination strategy in the ensemble learning model. The proposed approach automatically repairs data without any user interaction. Moreover, the proposed method can detect more than one noise in a record. Experimental results show that our approach outperforms similar repairing algorithms (HoloClean and KATARA) in both terms of precision and recall. (C) 2020 Elsevier Ltd. All rights reserved.
引用
下载
收藏
页数:18
相关论文
共 50 条
  • [41] A novel anomaly detection approach based on ensemble semi-supervised active learning (ADESSA)
    Niu, Zequn
    Guo, Wenjie
    Xue, Jingfeng
    Wang, Yong
    Kong, Zixiao
    Huang, Lu
    COMPUTERS & SECURITY, 2023, 129
  • [42] Improving Colorectal Polyp Classification Based on Physical Examination Data-An Ensemble Learning Approach
    Xie, Xiaolei
    Xing, Jie
    Kong, Nan
    Li, Chong
    Li, Jinlin
    Zhang, Shutian
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2018, 3 (01): : 434 - 441
  • [43] An ensemble learning approach to condition assessment of dissipative CLT connections based on piezoceramic sensor data
    Chen, Lin
    Xiong, Haibei
    Li, Xiuquan
    Lu, Yurong
    Kong, Qingzhao
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 117
  • [44] A Data-Driven Approach for Lithology Identification Based on Parameter-Optimized Ensemble Learning
    Sun, Zhixue
    Jiang, Baosheng
    Li, Xiangling
    Li, Jikang
    Xiao, Kang
    ENERGIES, 2020, 13 (15)
  • [45] Meteorological Data Fusion Approach for Modeling Crop Water Productivity Based on Ensemble Machine Learning
    Elbeltagi, Ahmed
    Srivastava, Aman
    Kushwaha, Nand Lal
    Juhasz, Csaba
    Tamas, Janos
    Nagy, Attila
    WATER, 2023, 15 (01)
  • [46] LEARNING HOW TO INTERPOLATE FOURIER DATA WITH UNKNOWN AUTOREGRESSIVE STRUCTURE: AN ENSEMBLE-BASED APPROACH
    Kim, Tae Hyung
    Haldar, Justin P.
    CONFERENCE RECORD OF THE 2019 FIFTY-THIRD ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2019, : 1471 - 1475
  • [47] A Novel Online Ensemble Approach for Concept Drift in Data Streams
    Sidhu, Parneeta
    Bhatia, M. P. S.
    Bindal, Aditya
    2013 IEEE SECOND INTERNATIONAL CONFERENCE ON IMAGE INFORMATION PROCESSING (ICIIP), 2013, : 550 - 555
  • [48] A Novel Tracking Method Based on Ensemble Metric Learning
    Huo, Qirun
    Lu, Yao
    2014 TENTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2014, : 176 - 179
  • [49] A selective ensemble learning approach based on evolutionary algorithm
    Zhang, Yong
    Liu, Bo
    Yu, Jiaxin
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2017, 32 (03) : 2365 - 2373
  • [50] A query interface judging approach based on ensemble learning
    Yan, Zhongmin
    Li, Qingzhong
    Cheng, Meng
    Huang, Qiuyan
    Journal of Computational Information Systems, 2012, 8 (03): : 1265 - 1273