Pattern selection approaches for the logical analysis of data considering the outliers and the coverage of a pattern

被引:12
|
作者
Han, Jeong [1 ]
Kim, Norman [2 ]
Yum, Bong-Jin [1 ]
Jeong, Myong K. [2 ]
机构
[1] Korea Adv Inst Sci & Technol, Dept Ind & Syst Engn, Taejon 305701, South Korea
[2] Rutgers State Univ, RUTCOR Rutgers Ctr Operat Res, Piscataway, NJ USA
关键词
Classification; Logical analysis of data; Pattern selection; Set covering problem;
D O I
10.1016/j.eswa.2011.04.189
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The logical analysis of data (LAD) is one of the most promising data mining methods developed to date for extracting knowledge from data. The key feature of the LAD is the capability of detecting hidden patterns in the data. Because patterns are basically combinations of certain attributes, they can be used to build a decision boundary for classification in the LAD by providing important information to distinguish observations in one class from those in the other. The use of patterns may result in a more stable performance in terms of being able to classify both positive and negative classes due to their robustness to measurement errors. The LAD technique, however, tends to choose too many patterns by solving a set covering problem to build a classifier; this is especially the case when outliers exist in the data set. In the set covering problem of the LAD, each observation should be covered by at least one pattern, even though the observation is an outlier. Thus, existing approaches tend to select too many patterns to cover these outliers, resulting in the problem of overfitting. Here, we propose new pattern selection approaches for LAD that take both outliers and the coverage of a pattern into account. The proposed approaches can avoid the problem of overfitting by building a sparse classifier. The performances of the proposed pattern selection approaches are compared with existing LAD approaches using several public data sets. The computational results show that the sparse classifiers built on the patterns selected by the proposed new approaches yield an improved classification performance compared to the existing approaches, especially when outliers exist in the data set. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:13857 / 13862
页数:6
相关论文
共 50 条
  • [1] Integrated optimization model and algorithm for pattern generation and selection in logical analysis of data
    Ouyang, Ruilin
    Chou, Chun-An
    [J]. COMPUTERS & OPERATIONS RESEARCH, 2020, 124
  • [2] MILP approach to pattern generation in logical analysis of data
    Ryoo, Hong Seo
    Jang, In-Yong
    [J]. DISCRETE APPLIED MATHEMATICS, 2009, 157 (04) : 749 - 761
  • [3] Accelerated algorithm for pattern detection in logical analysis of data
    Alexe, S
    Hammer, PL
    [J]. DISCRETE APPLIED MATHEMATICS, 2006, 154 (07) : 1050 - 1063
  • [4] An efficient driver behavioral pattern analysis based on fuzzy logical feature selection and classification in big data analysis
    Malik, Meenakshi
    Nandal, Rainu
    Dalal, Surjeet
    Maan, Ujjawal
    Le, Dac-Nhuong
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (03) : 3283 - 3292
  • [5] Multi-pattern generation framework for logical analysis of data
    Chun-An Chou
    Tibérius O. Bonates
    Chungmok Lee
    Wanpracha Art Chaovalitwongse
    [J]. Annals of Operations Research, 2017, 249 : 329 - 349
  • [6] Multi-pattern generation framework for logical analysis of data
    Chou, Chun-An
    Bonates, Tiberius O.
    Lee, Chungmok
    Chaovalitwongse, Wanpracha Art
    [J]. ANNALS OF OPERATIONS RESEARCH, 2017, 249 (1-2) : 329 - 349
  • [7] Computational Complexity and ILP Models for Pattern Problems in the Logical Analysis of Data
    Lancia, Giuseppe
    Serafini, Paolo
    [J]. ALGORITHMS, 2021, 14 (08)
  • [8] Two realizations of the pattern informativity idea for the method of data logical analysis
    Kuzmich, R., I
    Mashinets, E. E.
    Povazhnyuk, I
    Stupina, A. A.
    [J]. II INTERNATIONAL SCIENTIFIC CONFERENCE ON APPLIED PHYSICS, INFORMATION TECHNOLOGIES AND ENGINEERING 25, PTS 1-5, 2020, 1679
  • [9] Detecting Outliers in Data Streams Based on Minimum Rare Pattern Mining and Pattern Matching
    Li, Yun
    Cai, Saihua
    [J]. INFORMATION TECHNOLOGY AND CONTROL, 2022, 51 (02): : 268 - 282
  • [10] The Complexity of Some Pattern Problems in the Logical Analysis of Large Genomic Data Sets
    Lancia, Giuseppe
    Serafini, Paolo
    [J]. BIOINFORMATICS AND BIOMEDICAL ENGINEERING (IWBBIO 2016), 2016, 9656 : 3 - 12