stepwiseCM: An R Package for Stepwise Classification of Cancer Samples Using Multiple Heterogeneous Data Sets

被引:0
|
作者
Obulkasim, Askar [1 ]
van de Wiel, Mark A. [2 ]
机构
[1] Vrije Univ Amsterdam, Med Ctr, Dept Epidemiol & Biostat, Amsterdam, Netherlands
[2] Vrije Univ Amsterdam, Dept Math, Amsterdam, Netherlands
关键词
classification; data integration; high-dimensional data; R package;
D O I
10.4137/CIN.S13075
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
This paper presents the R/Bioconductor package stepwiseCM, which classifies cancer samples using two heterogeneous data sets in an efficient way. The algorithm is able to capture the distinct classification power of two given data types without actually combining them. This package suits for classification problems where two different types of data sets on the same samples are available. One of these data types has measurements on all samples and the other one has measurements on some samples. One is easy to collect and/or relatively cheap (eg, clinical covariates) compared to the latter (high-dimensional data, eg, gene expression). One additional application for which stepwiseCM is proven to be useful as well is the combination of two highdimensional data types, eg, DNA copy number and mRNA expression. The package includes functions to project the neighborhood information in one data space to the other to determine a potential group of samples that are likely to benefit most by measuring the second type of covariates. The two heterogeneous data spaces are connected by indirect mapping. The crucial difference between the stepwise classification strategy implemented in this package and the existing packages is that our approach aims to be cost-efficient by avoiding measuring additional covariates, which might be expensive or patient-unfriendly, for a potentially large subgroup of individuals. Moreover, in diagnosis for these individuals test, results would be quickly available, which may lead to reduced waiting times and hence lower the patients' distress. The improvement described remedies the key limitations of existing packages, and facilitates the use of the stepwiseCM package in diverse applications.
引用
收藏
页码:1 / 11
页数:11
相关论文
共 50 条
  • [21] WordListsAnalytics: An R package for multiple data analysis of Property Listing Tasks
    Heredia, Cristobal
    Moreno, Sebastian
    Canessa, Enrique
    Chaigneau, Sergio
    SOFTWAREX, 2024, 27
  • [22] Multiple Imputation of Multilevel Missing Data: An Introduction to the R Package pan
    Grund, Simon
    Luedtke, Oliver
    Robitzsch, Alexander
    SAGE OPEN, 2016, 6 (04):
  • [23] caOmicsV: an R package for visualizing multidimensional cancer genomic data
    Hongen Zhang
    Paul S. Meltzer
    Sean R. Davis
    BMC Bioinformatics, 17
  • [24] mixOmics: An R package for 'omics feature selection and multiple data integration
    Rohart, Florian
    Gautier, Benoit
    Singh, Amrit
    Le Cao, Kim-Anh
    PLOS COMPUTATIONAL BIOLOGY, 2017, 13 (11)
  • [25] Evaluation of Domain Adaptation Approaches for Robust Classification of Heterogeneous Biological Data Sets
    Schneider, Michael
    Wang, Lichao
    Marr, Carsten
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: DEEP LEARNING, PT II, 2019, 11728 : 673 - 686
  • [26] caOmicsV: an R package for visualizing multidimensional cancer genomic data
    Zhang, Hongen
    Meltzer, Paul S.
    Davis, Sean R.
    BMC BIOINFORMATICS, 2016, 17
  • [27] Evaluating the Prognostic Gene Signatures in Bladder Cancer Using 'Curated Bladder Data' R Package
    Al-Dulaimi, Ragheed
    Muhammadi, Shakiba
    HUMAN HEREDITY, 2016, 81 (02) : 51 - 51
  • [28] Variable Selection in Frailty Models using FrailtyHL R Package: Breast Cancer Survival Data
    Kim, Bohyeon
    Ha, Il Do
    Noh, Maengseok
    Na, Myung Hwan
    Song, Ho-Chun
    Kim, Jahae
    KOREAN JOURNAL OF APPLIED STATISTICS, 2015, 28 (05) : 965 - 976
  • [29] Multiple Gene Sets for Cancer Classification Using Gene Range Selection Based on Random Forest
    Moorthy, Kohbalan
    Bin Mohamad, Mohd Saberi
    Deris, Safaai
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2013), PT I,, 2013, 7802 : 385 - 393
  • [30] Improved Medical Image Classification Accuracy on Heterogeneous and Imbalanced Data using Multiple Streams Network
    Ali, Mumtaz
    Ali, Riaz
    Hussain, Nazim
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (07) : 617 - 622