Handling numeric attributes when comparing Bayesian network classifiers: does the discretization method matter?

被引:0
|
作者
M. Julia Flores
José A. Gámez
Ana M. Martínez
José M. Puerta
机构
[1] University of Castilla-La Mancha,Computer Systems Department, Intelligent Systems & Data Mining—SIMD, I3A
来源
Applied Intelligence | 2011年 / 34卷
关键词
Discretization; Bayesian classifiers; AODE; Naive Bayes;
D O I
暂无
中图分类号
学科分类号
摘要
Within the framework of Bayesian networks (BNs), most classifiers assume that the variables involved are of a discrete nature, but this assumption rarely holds in real problems. Despite the loss of information discretization entails, it is a direct easy-to-use mechanism that can offer some benefits: sometimes discretization improves the run time for certain algorithms; it provides a reduction in the value set and then a reduction in the noise which might be present in the data; in other cases, there are some Bayesian methods that can only deal with discrete variables. Hence, even though there are many ways to deal with continuous variables other than discretization, it is still commonly used. This paper presents a study of the impact of using different discretization strategies on a set of representative BN classifiers, with a significant sample consisting of 26 datasets. For this comparison, we have chosen Naive Bayes (NB) together with several other semi-Naive Bayes classifiers: Tree-Augmented Naive Bayes (TAN), k-Dependence Bayesian (KDB), Aggregating One-Dependence Estimators (AODE) and Hybrid AODE (HAODE). Also, we have included an augmented Bayesian network created by using a hill climbing algorithm (BNHC). With this comparison we analyse to what extent the type of discretization method affects classifier performance in terms of accuracy and bias-variance discretization. Our main conclusion is that even if a discretization method produces different results for a particular dataset, it does not really have an effect when classifiers are being compared. That is, given a set of datasets, accuracy values might vary but the classifier ranking is generally maintained. This is a very useful outcome, assuming that the type of discretization applied is not decisive future experiments can be d times faster, d being the number of discretization methods considered.
引用
收藏
页码:372 / 385
页数:13
相关论文
共 38 条
  • [21] DOES WHOM YOU KNOW MATTER? UNRAVELING THE INFLUENCE OF PEERS' NETWORK ATTRIBUTES ON ACADEMIC PERFORMANCE
    Jain, Tarun
    Langer, Nishtha
    ECONOMIC INQUIRY, 2019, 57 (01) : 141 - 161
  • [22] When does ideology matter? Party lists, personal attributes and the effect of ideology on intra-party success
    Isotalo, Veikko
    Helimaeki, Theodora
    Mattila, Mikko
    Von Schoultz, Asa
    EUROPEAN JOURNAL OF POLITICAL RESEARCH, 2023, 62 (04) : 1257 - 1279
  • [23] When Does Differential Item Functioning Matter for Screening? A Method for Empirical Evaluation
    Gonzalez, Oscar
    Pelham, William E., III
    ASSESSMENT, 2021, 28 (02) : 446 - 456
  • [24] Does Language Matter When Using a Graphical Method for Calculating the Speech Intelligibility Index?
    Jin, In-Ki
    Kates, James M.
    Arehart, Kathryn H.
    JOURNAL OF THE AMERICAN ACADEMY OF AUDIOLOGY, 2017, 28 (02) : 119 - 126
  • [25] Size Does Matter: Overcoming Limitations during Training when using a Feature Pyramid Network
    Fallas-Moya, Fabian
    Gonzalez-Hernandez, Manfred
    Sadovnik, Amir
    20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 1553 - 1560
  • [26] Does patient size matter when comparing the operation time between single-incision and conventional laparoscopic sleeve gastrectomy?
    Sun, Cheuk-Kwan
    Chen, I-Wen
    Tsai, I-Ting
    Hung, Kuo-Chuan
    INTERNATIONAL JOURNAL OF SURGERY, 2024, 110 (05) : 3093 - 3094
  • [27] Does Sample Size, Sampling Strategy, or Handling of Concentrations Below the Lower Limit of Quantification Matter When Externally Evaluating Population Pharmacokinetic Models?
    El Hassani, Mehdi
    Liebchen, Uwe
    Marsot, Amelie
    EUROPEAN JOURNAL OF DRUG METABOLISM AND PHARMACOKINETICS, 2024, 49 (04) : 419 - 436
  • [28] Effects of aging on default mode network activity in resting state fMRI: Does the method of analysis matter?
    Koch, W.
    Teipel, S.
    Mueller, S.
    Buerger, K.
    Bokde, A. L. W.
    Hampel, H.
    Coates, U.
    Reiser, M.
    Meindl, T.
    NEUROIMAGE, 2010, 51 (01) : 280 - 287
  • [29] IS THE PHASE OF THE MENSTRUAL CYCLE IMPORTANT WHEN SCREENING FOR PRIMARY ALDOSTERONISM (PAL) IN WOMEN, AND DOES RENIN ASSAY METHOD MATTER?
    Ahmed, A. H.
    Gordon, R. D.
    Taylor, P.
    Ward, G.
    Stowasser, M.
    HYPERTENSION, 2010, 55 (06) : 1494 - 1494
  • [30] A study on the use of imputation methods for experimentation with Radial Basis Function Network classifiers handling missing attribute values: The good synergy between RBFNs and Event Covering method
    Luengo, Julian
    Garcia, Salvador
    Herrera, Francisco
    NEURAL NETWORKS, 2010, 23 (03) : 406 - 418