Utilization of virtual samples to facilitate cancer identification for DNA microarray data in the early stages of an investigation

被引:22
|
作者
Li, Der-Chiang [2 ]
Fang, Yao-Hwei [3 ]
Lai, Yung-Yao [2 ]
Hu, Susan C. [1 ]
机构
[1] Natl Cheng Kung Univ, Dept Publ Hlth, Coll Med, Tainan 701, Taiwan
[2] Natl Cheng Kung Univ, Dept Ind & Informat Management, Tainan 701, Taiwan
[3] Natl Hlth Res Inst, Div Biostat & Bioinformat, Zhunan 350, Miaoli Country, Taiwan
关键词
Classification; DNA microarray; Gene selection; Small-sample problem; Virtual Sample Generation; CLASSIFICATION METHODS; KNOWLEDGE; SELECTION; GENES; TUMOR;
D O I
10.1016/j.ins.2009.04.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
DNA microarray datasets are generally small in size, high dimensional with many non-discriminative genes, and non-linear with outliers. Their size/dimension ratio suggests that DNA microarray datasets are identified as small-sample problems. Recently, researchers have developed various gene selection algorithms to discover genes that are most relevant to a specific disease, and thus to reduce computation. Most gene selection algorithms improve learning performance and efficiency, but still suffer from the limitation of insufficient training samples in the datasets. Moreover, in the early stage of diagnosing a new disease, very limited data can be obtained. Therefore, the derived diagnostic model is usually unreliable to identify the new disease. Consequently, the diagnostic performance cannot always be robust, even with the gene selection algorithms. To solve the problem of very limited training dataset with non-linear data or outliers, we propose the method GVSG (Group Virtual Sample Generation), which is a non-linear Virtual Sample Generation algorithm. This non-linear method detects the characteristics in the very limited data, forms discrete groups of each discriminative gene, and systematically generates virtual samples for each of these to accelerate and stabilize the modeling process. The results show that this method significantly improves the learning accuracy in the early stage of DNA microarray data. (c) 2009 Elsevier Inc. All rights reserved.
引用
收藏
页码:2740 / 2753
页数:14
相关论文
共 50 条
  • [1] Cancer identification based on DNA microarray data
    Liu, Yihui
    EMERGING TECHNOLOGIES IN KNOWLEDGE DISCOVERY AND DATA MINING, 2007, 4819 : 153 - +
  • [2] Identification of significant features in DNA microarray data
    Bair, Eric
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2013, 5 (04): : 309 - 325
  • [3] Identification of key genes associated with gastric cancer based on DNA microarray data
    Sun, Hui
    ONCOLOGY LETTERS, 2016, 11 (01) : 525 - 530
  • [4] The development and utilization of a novel DNA microarray plafform for biomarker and target identification in advanced prostate cancer
    Walker, Steven M.
    McGoohan, Caroline
    Mcdyer, Fionnuala
    Oliver, Gavin R.
    McCabe, Nuala
    Deharo, Steve
    Johnston, Patrick G.
    Harkin, D. Paul
    Kennedy, Richard D.
    CANCER RESEARCH, 2010, 70
  • [5] Biclustering of DNA microarray data with early pruning
    Tewfik, AH
    Tchagang, AB
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 773 - 776
  • [6] DNA Microarray for Direct Identification of Bacterial Pathogens in Human Stool Samples
    Mao, Zhengguo
    Zheng, Haoxuan
    Wang, Xinying
    Lin, Shiyong
    Sun, Yong
    Jiang, Bo
    DIGESTION, 2008, 78 (2-3) : 131 - 138
  • [7] Identification of biomarkers for metastatic osteosarcoma based on DNA microarray data
    Wang, Q.
    NEOPLASMA, 2015, 62 (03) : 365 - 371
  • [8] Detection and species identification of Chlamydiaceae from veterinary and human samples by DNA microarray
    Kaditzky, S.
    Pavlovic, M.
    Schuhegger, R.
    Goerke, K.
    Lindermayer, M.
    Sing, A.
    Busch, U.
    Huber, I.
    INTERNATIONAL JOURNAL OF MEDICAL MICROBIOLOGY, 2008, 298 : 6 - 6
  • [9] Statistical analysis of DNA Microarray data in cancer research
    Fan, Jianqing
    Ren, Yi
    CLINICAL CANCER RESEARCH, 2006, 12 (15) : 4469 - 4473
  • [10] Gene extraction and identification tumor/cancer for microarray data of ovarian cancer
    Lee, Zne-Jung
    Lin, Shih-Wei
    Hsu, Cheng-Chic Veritas
    Huang, Yen-Po
    TENCON 2006 - 2006 IEEE REGION 10 CONFERENCE, VOLS 1-4, 2006, : 252 - +