A novel data repairing approach based on constraints and ensemble learning

被引:4
|
作者
Ataeyan, Mahdieh [1 ]
Daneshpour, Negin [1 ]
机构
[1] Shahid Rajaee Teacher Training Univ, Fac Comp Engn, Tehran, Iran
关键词
Data repairing; Noise detection; Functional dependency; Ensemble learning;
D O I
10.1016/j.eswa.2020.113511
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data repairing is an important task in data mining. This paper proposes a novel data repairing approach based on a combination of constraints and ensemble learning. At first, functional dependencies (FDs) are used as constraints to identify inconsistent records. For each FD, all repeated values in the correct records are discovered. After that, noisy attributes in erroneous records are detected using correct records and the repeated values. To correct the detected noises, a supervised ensemble learning model is constructed for each attribute. The ensemble model consists of a Bayes classifier, a decision tree, and a MultiLayer Perceptron (MLP). A majority of votes is used as the combination strategy in the ensemble learning model. The proposed approach automatically repairs data without any user interaction. Moreover, the proposed method can detect more than one noise in a record. Experimental results show that our approach outperforms similar repairing algorithms (HoloClean and KATARA) in both terms of precision and recall. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] A novel ensemble approach for heterogeneous data with active learning
    Salama, Mohamed
    Abdelkader, Hatem
    Abdelwahab, Amira
    [J]. INTERNATIONAL JOURNAL OF ENGINEERING BUSINESS MANAGEMENT, 2022, 14
  • [2] SWITCH: A novel approach to ensemble learning for heterogeneous data
    Jin, R
    Liu, H
    [J]. MACHINE LEARNING: ECML 2004, PROCEEDINGS, 2004, 3201 : 560 - 562
  • [3] Argumentation Based Joint Learning: A Novel Ensemble Learning Approach
    Xu, Junyi
    Yao, Li
    Li, Le
    [J]. PLOS ONE, 2015, 10 (05):
  • [4] A novel data-driven approach for residential electricity consumption prediction based on ensemble learning
    Chen, Kunlong
    Jiang, Jiuchun
    Zheng, Fangdan
    Chen, Kunjin
    [J]. ENERGY, 2018, 150 : 49 - 60
  • [5] An ensemble-based incremental learning approach to data fusion
    Parikh, Devi
    Polikar, Robi
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2007, 37 (02): : 437 - 450
  • [6] Towards Big Data Bayesian Network Learning - an Ensemble Learning Based Approach
    Tang, Yan
    Wang, Yu
    Li, Ling
    Cooper, Kendra M. L.
    [J]. 2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, : 355 - 357
  • [7] An Active Learning Approach for Ensemble-based Data Stream Mining
    Alabdulrahman, Rabaa
    Viktor, Herna
    Paquet, Eric
    [J]. KDIR: PROCEEDINGS OF THE 8TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT - VOL. 1, 2016, : 275 - 282
  • [8] A novel ensemble learning approach to extract urban impervious surface based on machine learning algorithms using SAR and optical data
    Ahmad, Muhammad Nasar
    Shao, Zhenfeng
    Xiao, Xiongwu
    Fu, Peng
    Javed, Akib
    Ara, Iffat
    [J]. INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 132
  • [9] An Ensemble Learning Approach for Data Stream Clustering
    Fathzadeh, Ramin
    Mokhtari, Vahid
    [J]. 2013 21ST IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2013,
  • [10] A novel ensemble approach for cancer data classification
    Zhao, Yaou
    Chen, Yuehui
    Zhang, Xueqin
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2007, PT 2, PROCEEDINGS, 2007, 4492 : 1211 - +