Cancer survival classification using integrated data sets and intermediate information

被引:18
|
作者
Kim, Shinuk [1 ,2 ,3 ]
Park, Taesung [2 ]
Kon, Mark [3 ]
机构
[1] Sangmyung Univ, Coll Liberal Arts, Cheonan 330729, Chungnam, South Korea
[2] Seoul Natl Univ, Dept Stat, Seoul 151747, South Korea
[3] Boston Univ, Dept Math & Stat, Boston, MA 02215 USA
关键词
Machine learning algorithm; Integration of data sets; Intermediate information; Survival time classification; GENE-EXPRESSION; HUMAN BREAST; PATIENT SURVIVAL; UP-REGULATION; CELL-GROWTH; MICRORNA; APOPTOSIS; PROFILES; PROTEIN; IDENTIFICATION;
D O I
10.1016/j.artmed.2014.06.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Objective: Although numerous studies related to cancer survival have been published, increasing the prediction accuracy of survival classes still remains a challenge. Integration of different data sets, such as microRNA (miRNA) and mRNA, might increase the accuracy of survival class prediction. Therefore, we suggested a machine learning (ML) approach to integrate different data sets, and developed a novel method based on feature selection with Cox proportional hazard regression model (FSCOX) to improve the prediction of cancer survival time. Methods: FSCOX provides us with intermediate survival information, which is usually discarded when separating survival into 2 groups (short- and long-term), and allows us to perform survival analysis. We used an ML-based protocol for feature selection, integrating information from miRNA and mRNA expression profiles at the feature level. To predict survival phenotypes, we used the following classifiers, first, existing ML methods, support vector machine (SVM) and random forest (RF), second, a new median-based classifier using FSCOX (FSCOX_median), and third, an SVM classifier using FSCOX (FSCOX_SVM). We compared these methods using 3 types of cancer tissue data sets: (i) miRNA expression, (ii) mRNA expression, and (iii) combined miRNA and mRNA expression. The latter data set included features selected either from the combined miRNA/mRNA profile or independently from miRNAs and mRNAs profiles (IFS). Results: In the ovarian data set, the accuracy of survival classification using the combined miRNA/mRNA profiles with IFS was 75% using RF, 86.36% using SVM, 84.09% using FSCOX_median, and 88.64% using FSCOX_SVM with a balanced 22 short-term and 22 long-term survivor data set. These accuracies are higher than those using miRNA alone (70.45%, RF; 75%, SVM; 75%, FSCOX_median; and 75%, FSCOX_SVM) or mRNA alone (65.91%, RF; 63.64%, SVM; 72.73%, FSCOX_median; and 70.45%, FSCOX_SVM). Similarly in the glioblastoma multiforme data, the accuracy of miRNA/mRNA using IFS was 75.51% (RF), 87.76% (SVM) 85.71% (FSCOX_median), 85.71% (FSCOX_SVM). These results are higher than the results of using miRNA expression and mRNA expression alone. In addition we predict 16 hsa-miR-23b and hsa-miR-27b target genes in ovarian cancer data sets, obtained by SVM-based feature selection through integration of sequence information and gene expression profiles. Conclusion: Among the approaches used, the integrated miRNA and mRNA data set yielded better results than the individual data sets. The best performance was achieved using the FSCOX_SVM method with independent feature selection, which uses intermediate survival information between short-term and long-term survival time and the combination of the 2 different data sets. The results obtained using the combined data set suggest that there are some strong interactions between miRNA and mRNA features that are not detectable in the individual analyses. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:23 / 31
页数:9
相关论文
共 50 条
  • [1] Computational methods for cancer survival classification using intermediate information
    Kim, Shinuk
    Park, Taesung
    Kon, Mark
    PROCEEDINGS IWBBIO 2013: INTERNATIONAL WORK-CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING, 2013, : 517 - +
  • [2] An Integrated Approach for Cancer Survival Prediction Using Data Mining Techniques
    Kaur, Ishleen
    Doja, M. N.
    Ahmad, Tanvir
    Ahmad, Musheer
    Hussain, Amir
    Nadeem, Ahmed
    Abd El-Latif, Ahmed A.
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
  • [3] An Integrated Deep Network for Cancer Survival Prediction Using Omics Data
    Hassanzadeh, Hamid Reza
    Wang, May D.
    FRONTIERS IN BIG DATA, 2021, 4
  • [4] An Integrated Feature Selection Algorithm for Cancer Classification using Gene Expression Data
    Ahmed, Saeed
    Kabir, Muhammad
    Ali, Zakir
    Arif, Muhammad
    Ali, Farman
    Yu, Dong-Jun
    COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2018, 21 (09) : 631 - 645
  • [5] Classification using small fuzzy biological data sets
    Diederich, J
    Fortuner, R
    1998 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AT THE IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE - PROCEEDINGS, VOL 1-2, 1998, : 1429 - 1434
  • [6] Classification with nominal data using intuitionistic fuzzy sets
    Szmidt, Eulalia
    Kacprzyk, Janusz
    FOUNDATIONS OF FUZZY LOGIC AND SOFT COMPUTING, PROCEEDINGS, 2007, 4529 : 76 - +
  • [7] Data Classification Using Rough Sets and Naive Bayes
    Al-Aidaroos, Khadija
    Abu Bakar, Azuraliza
    Othman, Zalinda
    ROUGH SET AND KNOWLEDGE TECHNOLOGY (RSKT), 2010, 6401 : 134 - 142
  • [8] Improving classification accuracy using data augmentation on small data sets
    Moreno-Barea, Francisco J.
    Jerez, Jose M.
    Franco, Leonardo
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 161 (161)
  • [9] Classification Rule Construction Using Particle Swarm Optimization Algorithm for Breast Cancer Data Sets
    Gandhi, K. Rajiv
    Karnan, Marcus
    Kannan, S.
    2010 INTERNATIONAL CONFERENCE ON SIGNAL ACQUISITION AND PROCESSING: ICSAP 2010, PROCEEDINGS, 2010, : 233 - 237
  • [10] stepwiseCM: An R Package for Stepwise Classification of Cancer Samples Using Multiple Heterogeneous Data Sets
    Obulkasim, Askar
    van de Wiel, Mark A.
    CANCER INFORMATICS, 2014, 13 : 1 - 11