TSG: a new algorithm for binary and multi-class cancer classification and informative genes selection

被引:29
|
作者
Wang, Haiyan [1 ]
Zhang, Hongyan [2 ,3 ,4 ]
Dai, Zhijun [2 ,4 ]
Chen, Ming-shun [5 ,6 ]
Yuan, Zheming [2 ,4 ]
机构
[1] Kansas State Univ, Dept Stat, Manhattan, KS 66506 USA
[2] Hunan Prov Key Lab Crop Germplasm Innovat & Util, Changsha 410128, Hunan, Peoples R China
[3] Hunan Agr Univ, Coll Informat Sci & Technol, Changsha 410128, Hunan, Peoples R China
[4] Hunan Agr Univ, Coll Bio Safety Sci & Technol, Changsha 410128, Hunan, Peoples R China
[5] USDA ARS, Manhattan, KS 66506 USA
[6] Kansas State Univ, Dept Entomol, Manhattan, KS 66506 USA
关键词
RANDOM SUBSPACE METHOD; MICROARRAY DATA; MOLECULAR CLASSIFICATION; EXPRESSION; PREDICTION; TUMOR; CARCINOMAS; REDUNDANCY; DISCOVERY; DIAGNOSIS;
D O I
10.1186/1755-8794-6-S1-S3
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background: One of the challenges in classification of cancer tissue samples based on gene expression data is to establish an effective method that can select a parsimonious set of informative genes. The Top Scoring Pair (TSP), k-Top Scoring Pairs (k-TSP), Support Vector Machines (SVM), and prediction analysis of microarrays (PAM) are four popular classifiers that have comparable performance on multiple cancer datasets. SVM and PAM tend to use a large number of genes and TSP, k-TSP always use even number of genes. In addition, the selection of distinct gene pairs in k-TSP simply combined the pairs of top ranking genes without considering the fact that the gene set with best discrimination power may not be the combined pairs. The k-TSP algorithm also needs the user to specify an upper bound for the number of gene pairs. Here we introduce a computational algorithm to address the problems. The algorithm is named Chisquare-statistic-based Top Scoring Genes (Chi-TSG) classifier simplified as TSG. Results: The TSG classifier starts with the top two genes and sequentially adds additional gene into the candidate gene set to perform informative gene selection. The algorithm automatically reports the total number of informative genes selected with cross validation. We provide the algorithm for both binary and multi-class cancer classification. The algorithm was applied to 9 binary and 10 multi-class gene expression datasets involving human cancers. The TSG classifier outperforms TSP family classifiers by a big margin in most of the 19 datasets. In addition to improved accuracy, our classifier shares all the advantages of the TSP family classifiers including easy interpretation, invariant to monotone transformation, often selects a small number of informative genes allowing follow-up studies, resistant to sampling variations due to within sample operations. Conclusions: Redefining the scores for gene set and the classification rules in TSP family classifiers by incorporating the sample size information can lead to better selection of informative genes and classification accuracy. The resulting TSG classifier offers a useful tool for cancer classification based on numerical molecular data.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] TSG: a new algorithm for binary and multi-class cancer classification and informative genes selection
    Haiyan Wang
    Hongyan Zhang
    Zhijun Dai
    Ming-shun Chen
    Zheming Yuan
    [J]. BMC Medical Genomics, 6
  • [2] A new optimal binary tree SVM Multi-class Classification Algorithm
    Qin, Yuping
    Qin, Pengda
    Wang, Yi
    Lun, Shuxian
    [J]. MECHATRONICS, ROBOTICS AND AUTOMATION, PTS 1-3, 2013, 373-375 : 1085 - +
  • [3] An iterative Algorithm of Key Feature Selection for Multi-class Classification
    Jung, Daeun
    Park, Hyunggon
    [J]. 2019 ELEVENTH INTERNATIONAL CONFERENCE ON UBIQUITOUS AND FUTURE NETWORKS (ICUFN 2019), 2019, : 523 - 525
  • [4] Binary classification trees for multi-class classification problems
    Lee, JS
    Oh, LS
    [J]. SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2003, : 770 - 774
  • [5] A GMM-Based Feature Selection Algorithm for Multi-Class Classification
    Choi, Tacksung
    Moon, Sunkuk
    Park, Young-cheol
    Youn, Dae-hee
    Lee, Seokpil
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2009, E92D (08): : 1584 - 1587
  • [6] Breaking the interactive bottleneck in multi-class classification with active selection and binary feedback
    Joshi, Ajay J.
    Porikli, Fatih
    Papanikolopoulos, Nikolaos
    [J]. 2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, : 2995 - 3002
  • [7] Binary and Multi-Class Malware Threads Classification
    Ahmed, Ismail Taha
    Jamil, Norziana
    Din, Marina Md.
    Hammad, Baraa Tareq
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (24):
  • [8] Multi-class classification algorithm for optical diagnosis of oral cancer
    Majumder, S. K.
    Gupta, A.
    Gupta, S.
    Ghosh, N.
    Gupta, P. K.
    [J]. JOURNAL OF PHOTOCHEMISTRY AND PHOTOBIOLOGY B-BIOLOGY, 2006, 85 (02) : 109 - 117
  • [9] Multi-class feature selection for texture classification
    Chen, Xue-wen
    Zeng, Xiangyan
    van Alphen, Deborah
    [J]. PATTERN RECOGNITION LETTERS, 2006, 27 (14) : 1685 - 1691
  • [10] Efficient Decomposition Selection for Multi-class Classification
    Chen, Yawen
    Wen, Zeyi
    He, Bingsheng
    Chen, Jian
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (04) : 3751 - 3764