A case-based reasoning system for recommendation of data cleaning algorithms in classification and regression tasks

被引:27
|
作者
Camilo Corrales, David [1 ,2 ]
Ledezma, Agapito [1 ]
Carlos Corrales, Juan [2 ]
机构
[1] Univ Carlos III Madrid, Dept Informat, Madrid 28911, Spain
[2] Univ Cauca, Grp Ingn Telemat, Sector Tulcan, Popayan, Colombia
关键词
Case-based reasoning; Classification; Regression; CONCEPTUAL-FRAMEWORK; KNOWLEDGE DISCOVERY; SUPPORT; SIMILARITY; SELECTION; CBR;
D O I
10.1016/j.asoc.2020.106180
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, advances in Information Technologies (social networks, mobile applications, Internet of Things, etc.) generate a deluge of digital data; but to convert these data into useful information for business decisions is a growing challenge. Exploiting the massive amount of data through knowledge discovery (KD) process includes identifying valid, novel, potentially useful and understandable patterns from a huge volume of data. However, to prepare the data is a non-trivial refinement task that requires technical expertise in methods and algorithms for data cleaning. Consequently, the use of a suitable data analysis technique is a headache for inexpert users. To address these problems, we propose a case-based reasoning system (CBR) to recommend data cleaning algorithms for classification and regression tasks. In our approach, we represent the problem space by the meta-features of the dataset, its attributes, and the target variable. The solution space contains the algorithms of data cleaning used for each dataset. We represent the cases through a Data Cleaning Ontology. The case retrieval mechanism is composed of a filter and similarity phases. In the first phase, we defined two filter approaches based on clustering and quartile analysis. These filters retrieve a reduced number of relevant cases. The second phase computes a ranking of the retrieved cases by filter approaches, and it scores a similarity between a new case and the retrieved cases. The retrieval mechanism proposed was evaluated through a set of judges. The panel of judges scores the similarity between a query case against all cases of the case-base (ground truth). The results of the retrieval mechanism reach an average precision on judges ranking of 94.5% in top 3 (P@3), for top 7 (P@7) 84.55%, while in top 10 (P@10) 78.35%. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] An Improved Collaborative Filtering Recommendation Algorithm Based on Case-Based Reasoning
    Xing, Lei
    Xu, Cunlu
    Wang, Wei
    Kang, Zefu
    PROCEEDINGS OF 2015 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2015), 2015, : 740 - 744
  • [32] Study on the key technology of personalized recommendation of case-based reasoning
    Li, Fuliang
    Sun, Jieli
    Zhangi, Xia
    International Journal of u- and e- Service, Science and Technology, 2015, 8 (04) : 377 - 382
  • [33] Learning Material Recommendation Based on Case-Based Reasoning Similarity Scores
    Masood, Mona
    Mokmin, Nur Azlina Mohamed
    2ND INTERNATIONAL CONFERENCE ON APPLIED SCIENCE AND TECHNOLOGY 2017 (ICAST'17), 2017, 1891
  • [34] Automatic diagnosis with genetic algorithms and case-based reasoning
    Guiu, JMGI
    Ribé, EGI
    Mansilla, EBI
    Fàbrega, XLI
    ARTIFICIAL INTELLIGENCE IN ENGINEERING, 1999, 13 (04): : 367 - 372
  • [35] Hybrid genetic algorithms and case-based reasoning systems
    Ahn, H
    Kim, KJ
    Han, I
    COMPUTATIONAL AND INFORMATION SCIENCE, PROCEEDINGS, 2004, 3314 : 922 - 927
  • [36] Precipitation Data Assimilation System Based on a Neural Network and Case-Based Reasoning System
    Lu, Jing
    Hu, Wei
    Zhang, Xiakun
    INFORMATION, 2018, 9 (05)
  • [37] Distributed Case-based Reasoning System Based on Big Data Platform Hadoop
    Wang, Chong-Yang
    Wang, Hong-Bing
    Liang, Yan-Rui
    2015 INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND INFORMATION SYSTEM (SEIS 2015), 2015, : 629 - 634
  • [38] Cancer classification using case-based reasoning classifier
    Machcha, Lilybert
    Lhattacharya, Prabir
    2006 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-6, PROCEEDINGS, 2006, : 3602 - +
  • [39] An ordinal model for case-based reasoning in a classification task
    Mechitov, AI
    Moshkovich, HM
    Bradley, JH
    Schellenberger, RE
    INTERNATIONAL JOURNAL OF EXPERT SYSTEMS, 1996, 9 (02): : 225 - 242
  • [40] Conversational Case-Based Reasoning in Medical Classification and Diagnosis
    McSherry, David
    ARTIFICIAL INTELLIGENCE IN MEDICINE, PROCEEDINGS, 2009, 5651 : 116 - 125