REC: fast sparse regression-based multicategory classification

被引:1
|
作者
Zhang, Chong [1 ]
Lu, Xiaoling [2 ]
Zhu, Zhengyuan [3 ]
Hu, Yin [4 ]
Singh, Darshan [5 ]
Jones, Corbin [6 ]
Liu, Jinze [7 ]
Prins, Jan F. [5 ]
Liu, Yufeng [8 ]
机构
[1] Univ Waterloo, Dept Stat & Actuarial Sci, Waterloo, ON N2L 3G1, Canada
[2] Renmin Univ China, Ctr Appl Stat, Sch Stat, Beijing, Peoples R China
[3] Iowa State Univ, Dept Stat, Ames, IA USA
[4] Sage Bionetworks, Seattle, WA USA
[5] Univ North Carolina Chapel Hill, Dept Comp Sci, Chapel Hill, NC USA
[6] Univ North Carolina Chapel Hill, Dept Biol, Chapel Hill, NC USA
[7] Univ Kentucky, Dept Comp Sci, Lexington, KY 40506 USA
[8] Univ North Carolina Chapel Hill, Dept Stat & Operat Res, UNC Lineberger Comprehens Canc Ctr, Dept Genet,Dept Biostat,Carolina Ctr Genome Sci, Chapel Hill, NC USA
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
LASSO; Parallel computing; Probability estimation; Simplex; Variable selection; SUPPORT VECTOR MACHINES; TUMOR CLASSIFICATION; LOGISTIC-REGRESSION; VARIABLE SELECTION; SHRINKAGE;
D O I
10.4310/SII.2017.v10.n2.a2
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Recent advance in technology enables researchers to gather and store enormous data sets with ultra high dimensionality. In bioinformatics, microarray and next generation sequencing technologies can produce data with tens of thousands of predictors of biomarkers. On the other hand, the corresponding sample sizes are often limited. For classification problems, to predict new observations with high accuracy, and to better understand the effect of predictors on classification, it is desirable, and often necessary, to train the classifier with variable selection. In the literature, sparse regularized classification techniques have been popular due to the ability of simultaneous classification and variable selection. Despite its success, such a sparse penalized method may have low computational speed, when the dimension of the problem is ultra high. To overcome this challenge, we propose a new sparse REgression based multicategory Classifier (REC). Our method uses a simplex to represent different categories of the classification problem. A major advantage of REC is that the optimization can be decoupled into smaller independent sparse penalized regression problems, and hence solved by using parallel computing. Consequently, REC enjoys an extraordinarily fast computational speed. Moreover, REC is able to provide class conditional probability estimation. Simulated examples and applications on microarray and next generation sequencing data suggest that REC is very competitive when compared to several existing methods.
引用
收藏
页码:175 / 185
页数:11
相关论文
共 50 条
  • [41] Regression-based Metric Learning
    Moutafis, Panagiotis
    Leng, Mengjun
    Kakadiaris, Ioannis A.
    [J]. 2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2700 - 2705
  • [42] Linear Regression-Based Efficient SVM Learning for Large-Scale Classification
    Wu, Jianxin
    Yang, Hao
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2015, 26 (10) : 2357 - 2369
  • [43] Logistic Regression-Based Trichotomous Classification Tree and Its Application in Medical Diagnosis
    Zhu, Yanke
    Fang, Jiqian
    [J]. MEDICAL DECISION MAKING, 2016, 36 (08) : 973 - 989
  • [44] Identification of insulin resistance in Asian-Indian adolescents: Classification and regression tree (CART) and logistic regression-based classification rules
    Goel, Ruchika
    Misra, Anoop
    Kondal, Dimple
    Pandey, Ravindra M.
    Vikram, Naval K.
    Wasir, Jasjeet S.
    Dhingra, Vibha
    Luthra, Kalpana
    [J]. CIRCULATION, 2007, 115 (08) : E228 - E229
  • [45] A Sparse Classification Based on a Linear Regression Method for Spectral Recognition
    Ye, Pengchao
    Ji, Guoli
    Yuan, Lei-Ming
    Li, Limin
    Chen, Xiaojing
    Karimidehcheshmeh, Fatemeh
    Chen, Xi
    Huang, Guangzao
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (10):
  • [46] Heteroscedastic sparse Gaussian process regression-based stochastic material model for plastic structural analysis
    Chen, Baixi
    Shen, Luming
    Zhang, Hao
    [J]. SCIENTIFIC REPORTS, 2022, 12 (01)
  • [47] Sparse Logistic Regression-Based EEG Channel Optimization Algorithm for Improved Universality across Participants
    Shi, Yuxi
    Li, Yuanhao
    Koike, Yasuharu
    [J]. BIOENGINEERING-BASEL, 2023, 10 (06):
  • [48] Sliced inverse regression-based sparse polynomial chaos expansions for reliability analysis in high dimensions
    Pan, Qiujing
    Dias, Daniel
    [J]. RELIABILITY ENGINEERING & SYSTEM SAFETY, 2017, 167 : 484 - 493
  • [49] Hierarchical Sparse Autoencoder Using Linear Regression-based Features in Clustering for Handwritten Digit Recognition
    Phan, Hai T.
    Duong, An T.
    Nam Do-Hoang Le
    Tran, Son T.
    [J]. 2013 8TH INTERNATIONAL SYMPOSIUM ON IMAGE AND SIGNAL PROCESSING AND ANALYSIS (ISPA), 2013, : 183 - +
  • [50] Heteroscedastic sparse Gaussian process regression-based stochastic material model for plastic structural analysis
    Baixi Chen
    Luming Shen
    Hao Zhang
    [J]. Scientific Reports, 12