Classification performance of data mining algorithms applied to breast cancer data

被引:0
|
作者
Santos, Vitor [1 ]
Datia, Nuno [1 ]
Pato, M. P. M. [1 ]
机构
[1] ISEL, Lisbon, Portugal
关键词
ROC CURVE; AREA;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study how several classification algorithms perform when applied to a breast cancer dataset. The challenge is to develop models for computer-aided detection (CAD), capable to classify, at early stages, masses spotted in X-ray images. The dataset was available at KDD CUP 2008. The imbalanced nature of the dataset and its high-dimensional feature space poses problems to the modelling that are tackled using dimension reduction techniques. The algorithms are compared using the area under the curve (AUC) of the receiver operating characteristic curve (ROC) between true-and false-positive rates (TPR and FPR). Other metrics, such as patient sensitivity and FPR are used and discussed. We find that Naive Bayes classifier achieved the best performance irrespective of the combination of datasets and allow controlled trade-offs between false positives and negatives.
引用
收藏
页码:307 / 312
页数:6
相关论文
共 50 条
  • [41] Mining Breast Cancer Data with XCS
    Kharbat, Faten
    Bull, Larry
    Odeh, Mohammed
    [J]. GECCO 2007: GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, VOL 1 AND 2, 2007, : 2066 - +
  • [42] Performance analysis of data mining classification algorithms for early prediction of diabetes mellitus 2
    Devi, R. Delshi Howsalya
    Vijayalakshmi, P. R.
    [J]. INTERNATIONAL JOURNAL OF BIOMEDICAL ENGINEERING AND TECHNOLOGY, 2021, 36 (02) : 148 - 171
  • [43] Comparison of Data Mining Classification Algorithms on Educational Data under Different Conditions
    Koyuncu, Ilhan
    Gelbal, Selahattin
    [J]. JOURNAL OF MEASUREMENT AND EVALUATION IN EDUCATION AND PSYCHOLOGY-EPOD, 2020, 11 (04): : 325 - 345
  • [44] Classification and Prediction based Data Mining Algorithms to Predict Students' Introductory programming Performance
    Sivasakthi, M.
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTING AND INFORMATICS (ICICI 2017), 2017, : 346 - 350
  • [45] Comparative Analysis of Data Mining Algorithms for Cancer Gene Expression Data
    Thareja, Preeti
    Chhillar, Rajender Singh
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (10) : 322 - 328
  • [46] A Performance Evaluation of Classification Algorithms for Big Data
    Hai, Mo
    Zhang, You
    Zhang, Youjin
    [J]. 5TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT, ITQM 2017, 2017, 122 : 1100 - 1107
  • [47] Research on Data Mining Method for Breast Cancer Case Data
    Cao, Yanning
    Zhang, Xiaoshu
    [J]. CLOUD COMPUTING AND SECURITY, PT II, 2018, 11064 : 71 - 78
  • [48] Comparative Study on Data Mining Techniques Applied to Breast Cancer Gene Expression Profiles
    Mosquim Junior, Sergio
    de Oliveira, Juliana
    [J]. PROCEEDINGS OF THE 10TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, VOL 3: BIOINFORMATICS, 2017, : 168 - 175
  • [49] Automatic selection of classification learning algorithms for data mining practitioners
    Lee, Jun Won
    Giraud-Carrier, Christophe
    [J]. INTELLIGENT DATA ANALYSIS, 2013, 17 (04) : 665 - 678
  • [50] Improving Cancer Detection Classification Performance Using GANs in Breast Cancer Data
    Strelcenia, Emilija
    Prakoonwit, Simant
    [J]. IEEE ACCESS, 2023, 11 : 71594 - 71615