An Algorithm for Finding Biologically Significant Features in Microarray Data Based on A Priori Manifold Learning

被引:5
|
作者
Hira, Zena M. [1 ]
Trigeorgis, George [1 ]
Gillies, Duncan F. [1 ]
机构
[1] Univ London Imperial Coll Sci Technol & Med, Dept Comp, London, England
来源
PLOS ONE | 2014年 / 9卷 / 03期
关键词
CLASSIFICATION; DISCOVERY; SELECTION; CANCER; ISOMAP;
D O I
10.1371/journal.pone.0090562
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Microarray databases are a large source of genetic data, which, upon proper analysis, could enhance our understanding of biology and medicine. Many microarray experiments have been designed to investigate the genetic mechanisms of cancer, and analytical approaches have been applied in order to classify different types of cancer or distinguish between cancerous and non-cancerous tissue. However, microarrays are high-dimensional datasets with high levels of noise and this causes problems when using machine learning methods. A popular approach to this problem is to search for a set of features that will simplify the structure and to some degree remove the noise from the data. The most widely used approach to feature extraction is principal component analysis (PCA) which assumes a multivariate Gaussian model of the data. More recently, non-linear methods have been investigated. Among these, manifold learning algorithms, for example Isomap, aim to project the data from a higher dimensional space onto a lower dimension one. We have proposed a priori manifold learning for finding a manifold in which a representative set of microarray data is fused with relevant data taken from the KEGG pathway database. Once the manifold has been constructed the raw microarray data is projected onto it and clustering and classification can take place. In contrast to earlier fusion based methods, the prior knowledge from the KEGG databases is not used in, and does not bias the classification process-it merely acts as an aid to find the best space in which to search the data. In our experiments we have found that using our new manifold method gives better classification results than using either PCA or conventional Isomap.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Genetic algorithm-based feature selection with manifold learning for cancer classification using microarray data
    Zixuan Wang
    Yi Zhou
    Tatsuya Takagi
    Jiangning Song
    Yu-Shi Tian
    Tetsuo Shibuya
    BMC Bioinformatics, 24
  • [2] Genetic algorithm-based feature selection with manifold learning for cancer classification using microarray data
    Wang, Zixuan
    Zhou, Yi
    Takagi, Tatsuya
    Song, Jiangning
    Tian, Yu-Shi
    Shibuya, Tetsuo
    BMC BIOINFORMATICS, 2023, 24 (01)
  • [3] Identification of Biologically Significant Genes from Combinatorial Microarray Data
    Kong, Chang Sun
    Yu, Jing
    Minion, F. Chris
    Rajan, Krishna
    ACS COMBINATORIAL SCIENCE, 2011, 13 (05) : 562 - 571
  • [4] Identification of significant features in DNA microarray data
    Bair, Eric
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2013, 5 (04): : 309 - 325
  • [5] An Algorithm for Removing Redundancy Features in Microarray Data
    Yang, Sheng
    Zhao, Jun
    PROCEEDINGS 2013 INTERNATIONAL CONFERENCE ON MECHATRONIC SCIENCES, ELECTRIC ENGINEERING AND COMPUTER (MEC), 2013, : 1559 - 1563
  • [6] Mining biologically significant co-regulation patterns from microarray data
    Zhao, Yuhai
    Yin, Ying
    Wang, Guoren
    ROUGH SETS AND KNOWLEDGE TECHNOLOGY, PROCEEDINGS, 2006, 4062 : 408 - 414
  • [7] Application of a priori established gene sets to discover biologically important differential expression in microarray data
    Bild, A
    Febbo, PG
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (43) : 15278 - 15279
  • [8] Abnormal recognition algorithm based on manifold learning for turbopump mass data
    Xia, Lu-Rui
    Hu, Niao-Qing
    Qin, Guo-Jun
    Hangkong Dongli Xuebao/Journal of Aerospace Power, 2011, 26 (03): : 698 - 703
  • [9] An improved manifold learning algorithm for data visualization
    Gu, Rui-Jun
    Xu, Wen-Bo
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 1170 - +
  • [10] Digging for Significant Genes in Microarray Expression Data Based on Systematic Sampling and Hierarchal Clustering Algorithm
    Mohammed, Nwayyin N.
    GENEDIS 2020: COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2021, 1338 : 1 - 6