A novel data repairing approach based on constraints and ensemble learning

被引:4
|
作者
Ataeyan, Mahdieh [1 ]
Daneshpour, Negin [1 ]
机构
[1] Shahid Rajaee Teacher Training Univ, Fac Comp Engn, Tehran, Iran
关键词
Data repairing; Noise detection; Functional dependency; Ensemble learning;
D O I
10.1016/j.eswa.2020.113511
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data repairing is an important task in data mining. This paper proposes a novel data repairing approach based on a combination of constraints and ensemble learning. At first, functional dependencies (FDs) are used as constraints to identify inconsistent records. For each FD, all repeated values in the correct records are discovered. After that, noisy attributes in erroneous records are detected using correct records and the repeated values. To correct the detected noises, a supervised ensemble learning model is constructed for each attribute. The ensemble model consists of a Bayes classifier, a decision tree, and a MultiLayer Perceptron (MLP). A majority of votes is used as the combination strategy in the ensemble learning model. The proposed approach automatically repairs data without any user interaction. Moreover, the proposed method can detect more than one noise in a record. Experimental results show that our approach outperforms similar repairing algorithms (HoloClean and KATARA) in both terms of precision and recall. (C) 2020 Elsevier Ltd. All rights reserved.
引用
下载
收藏
页数:18
相关论文
共 50 条
  • [21] A Novel Ensemble-Learning-Based Convolution Neural Network for Handling Imbalanced Data
    Wu, Xianbin
    Wen, Chuanbo
    Wang, Zidong
    Liu, Weibo
    Yang, Junjie
    COGNITIVE COMPUTATION, 2024, 16 (01) : 177 - 190
  • [22] An effective ensemble learning approach for classification of glioma grades based on novel MRI features
    Hassan, Mohammed Falih
    Al-Zurfi, Ahmed Naser
    Abed, Mohammed Hamzah
    Ahmed, Khandakar
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [23] A stacking ensemble deep learning approach to cancer type classification based on TCGA data
    Mohammed, Mohanad
    Mwambi, Henry
    Mboya, Innocent B.
    Elbashir, Murtada K.
    Omolo, Bernard
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [24] A stacking ensemble deep learning approach to cancer type classification based on TCGA data
    Mohanad Mohammed
    Henry Mwambi
    Innocent B. Mboya
    Murtada K. Elbashir
    Bernard Omolo
    Scientific Reports, 11
  • [25] Improving Colorectal Polyp Classification based on Physical Examination Data - A Ensemble Learning Approach
    Li, Chong
    Xie, Xiaolei
    Li, Jinlin
    Kong, Nan
    2017 13TH IEEE CONFERENCE ON AUTOMATION SCIENCE AND ENGINEERING (CASE), 2017, : 193 - 194
  • [26] A Novel Ensemble-Learning-Based Convolution Neural Network for Handling Imbalanced Data
    Xianbin Wu
    Chuanbo Wen
    Zidong Wang
    Weibo Liu
    Junjie Yang
    Cognitive Computation, 2024, 16 : 177 - 190
  • [27] A novel deep ensemble based approach to detect crashes using sequential traffic data
    Taghipour, Homa
    Parsa, Amir Bahador
    Chauhan, Rishabh Singh
    Derrible, Sybil
    Mohammadian, Abolfazl
    IATSS RESEARCH, 2022, 46 (01) : 122 - 129
  • [28] Big data classification of learning behaviour based on data reduction and ensemble learning
    Wang, Taotao
    Wu, Xiaoxuan
    INTERNATIONAL JOURNAL OF CONTINUING ENGINEERING EDUCATION AND LIFE-LONG LEARNING, 2023, 33 (4-5) : 496 - 510
  • [29] A new Ensemble Learning Method Based on Pairwise Constraints and Subset Selection
    Ma, RanRan
    Zheng, XiaoShi
    Zhao, YanLing
    Yang, ChengZhong
    2009 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION SYSTEMS AND APPLICATIONS, PROCEEDINGS, 2009, : 268 - 270
  • [30] A Novel Ensemble Machine Learning Approach for Bioarchaeological Sex Prediction
    Muzzall, Evan
    TECHNOLOGIES, 2021, 9 (02)