Utilization of virtual samples to facilitate cancer identification for DNA microarray data in the early stages of an investigation

被引:22
|
作者
Li, Der-Chiang [2 ]
Fang, Yao-Hwei [3 ]
Lai, Yung-Yao [2 ]
Hu, Susan C. [1 ]
机构
[1] Natl Cheng Kung Univ, Dept Publ Hlth, Coll Med, Tainan 701, Taiwan
[2] Natl Cheng Kung Univ, Dept Ind & Informat Management, Tainan 701, Taiwan
[3] Natl Hlth Res Inst, Div Biostat & Bioinformat, Zhunan 350, Miaoli Country, Taiwan
关键词
Classification; DNA microarray; Gene selection; Small-sample problem; Virtual Sample Generation; CLASSIFICATION METHODS; KNOWLEDGE; SELECTION; GENES; TUMOR;
D O I
10.1016/j.ins.2009.04.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
DNA microarray datasets are generally small in size, high dimensional with many non-discriminative genes, and non-linear with outliers. Their size/dimension ratio suggests that DNA microarray datasets are identified as small-sample problems. Recently, researchers have developed various gene selection algorithms to discover genes that are most relevant to a specific disease, and thus to reduce computation. Most gene selection algorithms improve learning performance and efficiency, but still suffer from the limitation of insufficient training samples in the datasets. Moreover, in the early stage of diagnosing a new disease, very limited data can be obtained. Therefore, the derived diagnostic model is usually unreliable to identify the new disease. Consequently, the diagnostic performance cannot always be robust, even with the gene selection algorithms. To solve the problem of very limited training dataset with non-linear data or outliers, we propose the method GVSG (Group Virtual Sample Generation), which is a non-linear Virtual Sample Generation algorithm. This non-linear method detects the characteristics in the very limited data, forms discrete groups of each discriminative gene, and systematically generates virtual samples for each of these to accelerate and stabilize the modeling process. The results show that this method significantly improves the learning accuracy in the early stage of DNA microarray data. (c) 2009 Elsevier Inc. All rights reserved.
引用
收藏
页码:2740 / 2753
页数:14
相关论文
共 50 条
  • [21] 96 Well Microtitre Plate DNA Microarray for Fast Throughput of Bacteria Identification in Mastitic Milk Samples
    Green, J.
    Bednar, S.
    Klapproth, H.
    Brandstetter, T.
    Ruehe, J.
    EUROSENSORS 2015, 2015, 120 : 1075 - 1078
  • [22] DNA microarray-based detection and identification of fungal pathogens in clinical samples from neutropenic patients
    Spiess, Birgit
    Seifarth, Wolfgang
    Hummel, Margit
    Frank, Oliver
    Fabarius, Alice
    Zheng, Chun
    Moerz, Handan
    Hehlmann, Ruediger
    Buchheidt, Dieter
    JOURNAL OF CLINICAL MICROBIOLOGY, 2007, 45 (11) : 3743 - 3753
  • [23] Direct identification of chlamydiae from clinical samples using a DNA microarray assay-A validation study
    Borel, Nicole
    Kempf, Evelyne
    Hotzel, Helmut
    Schubert, Evelyn
    Torgerson, Paul
    Slickers, Peter
    Ehricht, Ralf
    Tasara, Taurai
    Pospischil, Andreas
    Sachse, Konrad
    MOLECULAR AND CELLULAR PROBES, 2008, 22 (01) : 55 - 64
  • [24] Identification of myelodysplastic syndrome-specific genes by DNA microarray analysis with "blast bank'' samples.
    Miyazato, A
    Ohmine, K
    Ueda, M
    Ozawa, K
    Mano, H
    BLOOD, 2000, 96 (11) : 544A - 544A
  • [25] Identification and analysis of key genes associated with ulcerative colitis based on DNA microarray data
    Song, Ruifeng
    Li, Ya
    Hao, Weiwei
    Wang, Bingxue
    Yang, Lei
    Xu, Feng
    MEDICINE, 2018, 97 (21)
  • [26] Statistical analysis of C-DNA microarray data for sample clustering and gene identification
    Coutier, Fabrice
    Sebastiani, Giovanni
    INTERNATIONAL JOURNAL OF INTELLIGENT COMPUTING AND CYBERNETICS, 2008, 1 (03) : 356 - 378
  • [27] Assessment of DNA methylation status in early stages of breast cancer development
    van Hoesel, A. Q.
    Sato, Y.
    Elashoff, D. A.
    Turner, R. R.
    Giuliano, A. E.
    Shamonki, J. M.
    Kuppen, P. J. K.
    van de Velde, C. J. H.
    Hoon, D. S. B.
    BRITISH JOURNAL OF CANCER, 2013, 108 (10) : 2033 - 2038
  • [28] Assessment of DNA methylation status in early stages of breast cancer development
    A Q van Hoesel
    Y Sato
    D A Elashoff
    R R Turner
    A E Giuliano
    J M Shamonki
    P J K Kuppen
    C J H van de Velde
    D S B Hoon
    British Journal of Cancer, 2013, 108 : 2033 - 2038
  • [29] Identification of candidate biomarkers for epithelial ovarian cancer metastasis using microarray data
    Li, Su
    Li, Hua
    Xu, Ying
    Lv, Xiaomei
    ONCOLOGY LETTERS, 2017, 14 (04) : 3967 - 3974
  • [30] DNA microarray data analysis: Effective feature selection for accurate cancer classification
    Patra, Jagdish C.
    Lim, Goh P.
    Meher, Pramod K.
    Ang, Ee Luang
    2007 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-6, 2007, : 260 - 265