Cancer survival classification using integrated data sets and intermediate information

被引:18
|
作者
Kim, Shinuk [1 ,2 ,3 ]
Park, Taesung [2 ]
Kon, Mark [3 ]
机构
[1] Sangmyung Univ, Coll Liberal Arts, Cheonan 330729, Chungnam, South Korea
[2] Seoul Natl Univ, Dept Stat, Seoul 151747, South Korea
[3] Boston Univ, Dept Math & Stat, Boston, MA 02215 USA
关键词
Machine learning algorithm; Integration of data sets; Intermediate information; Survival time classification; GENE-EXPRESSION; HUMAN BREAST; PATIENT SURVIVAL; UP-REGULATION; CELL-GROWTH; MICRORNA; APOPTOSIS; PROFILES; PROTEIN; IDENTIFICATION;
D O I
10.1016/j.artmed.2014.06.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Objective: Although numerous studies related to cancer survival have been published, increasing the prediction accuracy of survival classes still remains a challenge. Integration of different data sets, such as microRNA (miRNA) and mRNA, might increase the accuracy of survival class prediction. Therefore, we suggested a machine learning (ML) approach to integrate different data sets, and developed a novel method based on feature selection with Cox proportional hazard regression model (FSCOX) to improve the prediction of cancer survival time. Methods: FSCOX provides us with intermediate survival information, which is usually discarded when separating survival into 2 groups (short- and long-term), and allows us to perform survival analysis. We used an ML-based protocol for feature selection, integrating information from miRNA and mRNA expression profiles at the feature level. To predict survival phenotypes, we used the following classifiers, first, existing ML methods, support vector machine (SVM) and random forest (RF), second, a new median-based classifier using FSCOX (FSCOX_median), and third, an SVM classifier using FSCOX (FSCOX_SVM). We compared these methods using 3 types of cancer tissue data sets: (i) miRNA expression, (ii) mRNA expression, and (iii) combined miRNA and mRNA expression. The latter data set included features selected either from the combined miRNA/mRNA profile or independently from miRNAs and mRNAs profiles (IFS). Results: In the ovarian data set, the accuracy of survival classification using the combined miRNA/mRNA profiles with IFS was 75% using RF, 86.36% using SVM, 84.09% using FSCOX_median, and 88.64% using FSCOX_SVM with a balanced 22 short-term and 22 long-term survivor data set. These accuracies are higher than those using miRNA alone (70.45%, RF; 75%, SVM; 75%, FSCOX_median; and 75%, FSCOX_SVM) or mRNA alone (65.91%, RF; 63.64%, SVM; 72.73%, FSCOX_median; and 70.45%, FSCOX_SVM). Similarly in the glioblastoma multiforme data, the accuracy of miRNA/mRNA using IFS was 75.51% (RF), 87.76% (SVM) 85.71% (FSCOX_median), 85.71% (FSCOX_SVM). These results are higher than the results of using miRNA expression and mRNA expression alone. In addition we predict 16 hsa-miR-23b and hsa-miR-27b target genes in ovarian cancer data sets, obtained by SVM-based feature selection through integration of sequence information and gene expression profiles. Conclusion: Among the approaches used, the integrated miRNA and mRNA data set yielded better results than the individual data sets. The best performance was achieved using the FSCOX_SVM method with independent feature selection, which uses intermediate survival information between short-term and long-term survival time and the combination of the 2 different data sets. The results obtained using the combined data set suggest that there are some strong interactions between miRNA and mRNA features that are not detectable in the individual analyses. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:23 / 31
页数:9
相关论文
共 50 条
  • [21] Classification of vectorized medical data sets using artificial immune algorithms
    Wajs, Wieslaw
    Wais, Piotr
    Świȩcicki, Mariusz
    Stoch, Pawel
    Maj, Grzegorz
    Sukiennik, Artur
    Kruczek, Piotr
    Pietrzyk, Jacek
    IFAC Proc. Vol. (IFAC-PapersOnline), 1600, 20 (395-400):
  • [22] Classification of Imbalanced data sets using Multi Objective Genetic Programming
    Maheta, Hardik H.
    Dabhi, Vipul K.
    2015 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2015,
  • [23] Cancer Detection Based on Microarray Data Classification Using Deep Belief Network and Mutual Information
    1600, Institute of Electrical and Electronics Engineers Inc., United States
  • [24] Cancer Detection Based on Microarray Data Classification Using Deep Belief Network and Mutual Information
    Wisesty, Untari N.
    Pratama, Bintang B. P.
    Aditsania, Annisa
    Adiwijaya
    PROCEEDINGS OF 2017 5TH INTERNATIONAL CONFERENCE ON INSTRUMENTATION, COMMUNICATIONS, INFORMATION TECHNOLOGY, AND BIOMEDICAL ENGINEERING (ICICI-BME): SCIENCE AND TECHNOLOGY FOR A BETTER LIFE, 2017, : 157 - 162
  • [25] SVM classification for imbalanced data sets using a multiobjective optimization framework
    Askan, Aysegul
    Sayin, Serpil
    ANNALS OF OPERATIONS RESEARCH, 2014, 216 (01) : 191 - 203
  • [26] Effective Cancer Classification based on Gene Expression Data using Multidimensional Mutual Information and ELM
    Zhu, Qun-Xiong
    Fan, Yuan
    He, Yan-Lin
    Xu, Yuan
    PROCEEDINGS OF 2018 IEEE 7TH DATA DRIVEN CONTROL AND LEARNING SYSTEMS CONFERENCE (DDCLS), 2018, : 954 - 958
  • [27] MIRUMIR: an online tool to test microRNAs as biomarkers to predict survival in cancer using multiple clinical data sets
    A V Antonov
    R A Knight
    G Melino
    N A Barlev
    P O Tsvetkov
    Cell Death & Differentiation, 2013, 20 : 367 - 367
  • [28] MIRUMIR: an online tool to test microRNAs as biomarkers to predict survival in cancer using multiple clinical data sets
    Antonov, A. V.
    Knight, R. A.
    Melino, G.
    Barlev, N. A.
    Tsvetkov, P. O.
    CELL DEATH AND DIFFERENTIATION, 2013, 20 (02): : 367 - 367
  • [29] Graph Classification with Imbalanced Data Sets
    Xiao, Gang-Song
    Chen, Xiao-Yun
    2011 FIRST ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR), 2011, : 57 - 61
  • [30] DISPLAY TECHNIQUES FOR INTEGRATED DATA SETS
    FREEMAN, SB
    BOLIVAR, SL
    WEAVER, TA
    COMPUTERS & GEOSCIENCES, 1983, 9 (01) : 59 - 64