stepwiseCM: An R Package for Stepwise Classification of Cancer Samples Using Multiple Heterogeneous Data Sets

被引:0
|
作者
Obulkasim, Askar [1 ]
van de Wiel, Mark A. [2 ]
机构
[1] Vrije Univ Amsterdam, Med Ctr, Dept Epidemiol & Biostat, Amsterdam, Netherlands
[2] Vrije Univ Amsterdam, Dept Math, Amsterdam, Netherlands
关键词
classification; data integration; high-dimensional data; R package;
D O I
10.4137/CIN.S13075
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
This paper presents the R/Bioconductor package stepwiseCM, which classifies cancer samples using two heterogeneous data sets in an efficient way. The algorithm is able to capture the distinct classification power of two given data types without actually combining them. This package suits for classification problems where two different types of data sets on the same samples are available. One of these data types has measurements on all samples and the other one has measurements on some samples. One is easy to collect and/or relatively cheap (eg, clinical covariates) compared to the latter (high-dimensional data, eg, gene expression). One additional application for which stepwiseCM is proven to be useful as well is the combination of two highdimensional data types, eg, DNA copy number and mRNA expression. The package includes functions to project the neighborhood information in one data space to the other to determine a potential group of samples that are likely to benefit most by measuring the second type of covariates. The two heterogeneous data spaces are connected by indirect mapping. The crucial difference between the stepwise classification strategy implemented in this package and the existing packages is that our approach aims to be cost-efficient by avoiding measuring additional covariates, which might be expensive or patient-unfriendly, for a potentially large subgroup of individuals. Moreover, in diagnosis for these individuals test, results would be quickly available, which may lead to reduced waiting times and hence lower the patients' distress. The improvement described remedies the key limitations of existing packages, and facilitates the use of the stepwiseCM package in diverse applications.
引用
收藏
页码:1 / 11
页数:11
相关论文
共 50 条
  • [41] Improving classification accuracy using data augmentation on small data sets
    Moreno-Barea, Francisco J.
    Jerez, Jose M.
    Franco, Leonardo
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 161 (161)
  • [42] Porosity inference and classification of siliciclastic rocks from multiple data sets
    Loures, Luiz G. L.
    Moraes, Fernando S.
    GEOPHYSICS, 2006, 71 (05) : O65 - O76
  • [43] rarestR: An R Package Using Rarefaction Metrics to Estimate α- and β-Diversity for Incomplete Samples
    Zou, Yi
    Zhao, Peng
    Wu, Naicheng
    Lai, Jiangshan
    Peres-Neto, Pedro R.
    Axmacher, Jan C.
    DIVERSITY AND DISTRIBUTIONS, 2025, 31 (01)
  • [44] TURF analysis for CATA data using R package ?turfR ?
    Kuesten, Carla
    Bi, Jian
    FOOD QUALITY AND PREFERENCE, 2021, 91
  • [45] The mosaic Package: Helping Students to 'Think with Data' Using R
    Pruim, Randall
    Kaplan, Daniel T.
    Horton, Nicholas J.
    R JOURNAL, 2017, 9 (01): : 77 - 102
  • [46] Exploratory Analysis of Provenance Data Using R and the Provenance Package
    Vermeesch, Pieter
    MINERALS, 2019, 9 (03)
  • [47] Minimally-supervised classification using multiple observation sets
    Stauffer, C
    NINTH IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS I AND II, PROCEEDINGS, 2003, : 297 - 304
  • [48] Shape Classification using Multiple Classifiers with Different Feature Sets
    Chen, Junying
    Chen, Jing
    Feng, Zengxi
    ADVANCES IN CIVIL ENGINEERING AND ARCHITECTURE INNOVATION, PTS 1-6, 2012, 368-373 : 1583 - 1587
  • [49] Classification Rule Construction Using Particle Swarm Optimization Algorithm for Breast Cancer Data Sets
    Gandhi, K. Rajiv
    Karnan, Marcus
    Kannan, S.
    2010 INTERNATIONAL CONFERENCE ON SIGNAL ACQUISITION AND PROCESSING: ICSAP 2010, PROCEEDINGS, 2010, : 233 - 237
  • [50] RClone: a package to identify MultiLocus Clonal Lineages and handle clonal data sets in r.
    Bailleul, Diane
    Stoeckel, Solenn
    Arnaud-Haond, Sophie
    METHODS IN ECOLOGY AND EVOLUTION, 2016, 7 (08): : 966 - 970