Canonical Correlation Analysis for Data Reduction in Data Mining Applied to Predictive Models for Breast Cancer Recurrence

被引:0
|
作者
Razavi, Amir Reza [1 ]
Gill, Hans [1 ]
Ahlfeldt, Hans [1 ]
Shahsavar, Nosrat [1 ]
机构
[1] Linkoping Univ, Dept Biomed Engn, Univ Hosp, S-58185 Linkoping, Sweden
关键词
Data Mining; Artificial Neural Network (ANN); Canonical Correlation Analysis (CCA); Dimension Reduction; Breast Cancer;
D O I
暂无
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Data mining methods can be used for extracting specific medical knowledge such as important predictors for recurrence of breast cancer in pertinent data material. However, when there is a huge quantity of variables in the data material it is first necessary to identify and select important variables. In this study we present a preprocessing method for selecting important variables in a dataset prior to building a predictive model. In the dataset, data from 5787 female patients were, analysed. To cover more predictors and obtain a better assessment of the outcomes, data were retrieved from three different registers: the regional breast cancer, tumour markers, and cause of death registers. After retrieving information about selected predictors and outcomes from the different registers, the raw data were cleaned by running different logical rules. Thereafter, domain experts selected predictors assumed to be important regarding recurrence of breast cancer. After that, Canonical Correlation Analysis (CCA) was applied as a dimension reduction technique to preserve the character of the original data. Artificial Neural Network (ANN) was applied to the resulting dataset for two different analyses with the same settings. Performance of the predictive models was confirmed by ten-fold cross validation. The results showed an increase in the accuracy of the prediction and reduction of the mean absolute error.
引用
收藏
页码:175 / 180
页数:6
相关论文
共 50 条
  • [31] Data mining techniques applied to predictive modeling of the knurling process
    Feng, CXJ
    Wang, XFD
    IIE TRANSACTIONS, 2004, 36 (03) : 253 - 263
  • [32] Data mining for generating predictive models of local hydrology
    Hewett, R
    APPLIED INTELLIGENCE, 2003, 19 (03) : 157 - 170
  • [33] Data Mining for Generating Predictive Models of Local Hydrology
    Rattikorn Hewett
    Applied Intelligence, 2003, 19 : 157 - 170
  • [34] High-throughput data dimension reduction via seeded canonical correlation analysis
    Im, Yunju
    Gang, HeyIn
    Yoo, Jae Keun
    JOURNAL OF CHEMOMETRICS, 2015, 29 (03) : 193 - 199
  • [35] Analysis of cancer data: a data mining approach
    Delen, Dursun
    EXPERT SYSTEMS, 2009, 26 (01) : 100 - 112
  • [36] Research on Data Mining Method for Breast Cancer Case Data
    Cao, Yanning
    Zhang, Xiaoshu
    CLOUD COMPUTING AND SECURITY, PT II, 2018, 11064 : 71 - 78
  • [37] Comparative Study on Data Mining Techniques Applied to Breast Cancer Gene Expression Profiles
    Mosquim Junior, Sergio
    de Oliveira, Juliana
    PROCEEDINGS OF THE 10TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, VOL 3: BIOINFORMATICS, 2017, : 168 - 175
  • [38] The feasibility of constructing a predictive outcome model for breast cancer using the tools of data mining
    Jonsdottir, Thora
    Hvannberg, Ebba Thora
    Sigurdsson, Helgi
    Sigurdsson, Sven
    EXPERT SYSTEMS WITH APPLICATIONS, 2008, 34 (01) : 108 - 118
  • [39] Exploratory Data Analysis on Breast cancer dataset about Survivability and Recurrence
    Sweetlin, E. Jenifer
    Saudia, S.
    ICSPC'21: 2021 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION (ICPSC), 2021, : 304 - 308
  • [40] Decision tree based predictive models for breast cancer survivability on imbalanced data
    Liu Ya-Qin
    Wang Cheng
    Zhang Lu
    2009 3RD INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING, VOLS 1-11, 2009, : 312 - 315