A case-based reasoning system for recommendation of data cleaning algorithms in classification and regression tasks

被引:27
|
作者
Camilo Corrales, David [1 ,2 ]
Ledezma, Agapito [1 ]
Carlos Corrales, Juan [2 ]
机构
[1] Univ Carlos III Madrid, Dept Informat, Madrid 28911, Spain
[2] Univ Cauca, Grp Ingn Telemat, Sector Tulcan, Popayan, Colombia
关键词
Case-based reasoning; Classification; Regression; CONCEPTUAL-FRAMEWORK; KNOWLEDGE DISCOVERY; SUPPORT; SIMILARITY; SELECTION; CBR;
D O I
10.1016/j.asoc.2020.106180
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, advances in Information Technologies (social networks, mobile applications, Internet of Things, etc.) generate a deluge of digital data; but to convert these data into useful information for business decisions is a growing challenge. Exploiting the massive amount of data through knowledge discovery (KD) process includes identifying valid, novel, potentially useful and understandable patterns from a huge volume of data. However, to prepare the data is a non-trivial refinement task that requires technical expertise in methods and algorithms for data cleaning. Consequently, the use of a suitable data analysis technique is a headache for inexpert users. To address these problems, we propose a case-based reasoning system (CBR) to recommend data cleaning algorithms for classification and regression tasks. In our approach, we represent the problem space by the meta-features of the dataset, its attributes, and the target variable. The solution space contains the algorithms of data cleaning used for each dataset. We represent the cases through a Data Cleaning Ontology. The case retrieval mechanism is composed of a filter and similarity phases. In the first phase, we defined two filter approaches based on clustering and quartile analysis. These filters retrieve a reduced number of relevant cases. The second phase computes a ranking of the retrieved cases by filter approaches, and it scores a similarity between a new case and the retrieved cases. The retrieval mechanism proposed was evaluated through a set of judges. The panel of judges scores the similarity between a query case against all cases of the case-base (ground truth). The results of the retrieval mechanism reach an average precision on judges ranking of 94.5% in top 3 (P@3), for top 7 (P@7) 84.55%, while in top 10 (P@10) 78.35%. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Case-based reasoning, genetic algorithms, and the pile foundation information system
    Babka, O
    SYSTEMS DEVELOPMENT METHODS FOR DATABASES, ENTERPRISE MODELING, AND WORKFLOW MANAGEMENT, 1999, : 77 - +
  • [22] A case-based reasoning system for supervised classification problems in the medical field
    Bentaiba-Lagrid, Miled Basma
    Bouzar-Benlabiod, Lydia
    Rubin, Stuart H.
    Bouabana-Tebibel, Thouraya
    Hanini, Maria R.
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 150
  • [23] A case-based reasoning system for aiding detection and classification of nosocomial infections
    Gomez-Vallejo, H. J.
    Uriel-Latorre, B.
    Sande-Meijide, M.
    Villamarin-Bello, B.
    Pavon, R.
    Fdez-Riverola, F.
    Glez-Pena, D.
    DECISION SUPPORT SYSTEMS, 2016, 84 : 104 - 116
  • [24] Cancer Classification From DNA Microarray Using Genetic Algorithms and Case-Based Reasoning
    Machacha, Lilybert
    Bhattacharya, Prabir
    INTERNATIONAL JOURNAL OF SOFTWARE SCIENCE AND COMPUTATIONAL INTELLIGENCE-IJSSCI, 2021, 13 (01): : 17 - 37
  • [25] AdaBoosting for Case-Based Recommendation System
    Singal, Swati Mittal
    Tejal
    Juneja, Bhawna
    2016 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY (INCITE) - NEXT GENERATION IT SUMMIT ON THE THEME - INTERNET OF THINGS: CONNECT YOUR WORLDS, 2016,
  • [26] A Personalized Recommendation System Combining Case-Based Reasoning and User-Based Collaborative Filtering
    Zhu, XiaoMing
    Ye, HongWu
    Gong, SongJie
    CCDC 2009: 21ST CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-6, PROCEEDINGS, 2009, : 4026 - +
  • [27] Fuzzy Case-Based Reasoning System
    Lu, Jing
    Bai, Dingling
    Zhang, Ning
    Yu, Tiantian
    Zhang, Xiakun
    APPLIED SCIENCES-BASEL, 2016, 6 (07):
  • [28] Case-based reasoning disassembly system
    Zeid, I
    Gupta, SM
    Pan, L
    ENVIRONMENTALLY CONSCIOUS MANUFACTURING, 2001, 4193 : 186 - 193
  • [29] Research On multiple cases database construction of case-based reasoning personalized recommendation system
    Sun, Jieli
    Lu, Yun
    Li, Fuliang
    SENSORS, MEASUREMENT AND INTELLIGENT MATERIALS, PTS 1-4, 2013, 303-306 : 1448 - 1451
  • [30] Case-Based FCTF Reasoning System
    Lu, Jing
    Zhang, Xiakun
    Li, Peiren
    Zhu, Yu
    APPLIED SCIENCES-BASEL, 2015, 5 (04): : 825 - 839