Semi-supervised clustering for gene-expression data in multiobjective optimization framework

被引:25
|
作者
Alok, Abhay Kumar [1 ]
Saha, Sriparna [1 ]
Ekbal, Asif [1 ]
机构
[1] Indian Inst Technol, Comp Sci Engn, Patna, Bihar, India
关键词
Gene expression data clustering; Semi-supervised classification; Multiobjective optimization; Cluster validity index; AMOSA; TRANSCRIPTIONAL PROGRAM; OLIGONUCLEOTIDE ARRAYS; COEXPRESSED GENES; ALGORITHM; MICROARRAY; PATTERNS; CLASSIFICATION; INDEXES;
D O I
10.1007/s13042-015-0335-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Studying the patterns hidden in gene expression data helps to understand the functionality of genes. But due to the large volume of genes and the complexity of biological networks it is difficult to study the resulting mass of data which often consists of millions of measurements. In order to reveal natural structures and to identify interesting patterns from the given gene expression data set, clustering techniques are applied. Semi-supervised classification is a new direction of machine learning. It requires huge unlabeled data and a few labeled data. Semi-supervised classification in general performs better than unsupervised classification. But to the best of our knowledge there are no works for solving gene expression data clustering problem using semi-supervised classification techniques. In the current paper we have made an attempt to solve the gene expression data clustering problem using a multiobjective optimization based semi-supervised classification technique with the aim to attain good quality partitions by using few labeled data. In order to generate the labeled data, initially Fuzzy C-means clustering technique is applied. In order to automatically determine the partitioning, multiple cluster centers corresponding to a cluster are encoded in the form of a string. In order to compute the quality of the obtained partitioning, values of five objective functions are computed. The effectiveness of this proposed semi-supervised clustering technique is demonstrated on five publicly available benchmark gene expression data sets. Comparison results with the existing techniques for gene expression data clustering prove that the proposed method is the most effective one. Statistical and biological significance tests have also been carried out.
引用
收藏
页码:421 / 439
页数:19
相关论文
共 50 条
  • [41] A semi-supervised framework of clustering selection for de-duplication
    Kushagra, Shrinu
    Saxena, Hemant
    Ilyas, Ihab F.
    Ben-David, Shai
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 208 - 219
  • [42] Semi-supervised clustering methods
    Bair, Eric
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2013, 5 (05): : 349 - 361
  • [43] SEMI-SUPERVISED SPECTRAL CLUSTERING
    Mai, Xiaoyi
    Couillet, Romain
    2018 CONFERENCE RECORD OF 52ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2018, : 2012 - 2016
  • [44] A review on semi-supervised clustering
    Cai, Jianghui
    Hao, Jing
    Yang, Haifeng
    Zhao, Xujun
    Yang, Yuqing
    INFORMATION SCIENCES, 2023, 632 : 164 - 200
  • [45] Incremental Semi-Supervised Clustering Ensemble for High Dimensional Data Clustering
    Yu, Zhiwen
    Luo, Peinan
    You, Jane
    Wong, Hau-San
    Leung, Hareton
    Wu, Si
    Zhang, Jun
    Han, Guoqiang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (03) : 701 - 714
  • [46] Optimization Framework for Semi-supervised Attributed Graph Coarsening
    Kumar, Manoj
    Halder, Subhanu
    Kane, Archit
    Gupta, Ruchir
    Kumar, Sandeep
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2024, 244 : 2064 - 2075
  • [47] Incremental Semi-supervised Clustering Ensemble for High Dimensional Data Clustering
    Yu, Zhiwen
    Luo, Peinan
    Wu, Si
    Han, Guoqiang
    You, Jane
    Leung, Hareton
    Wong, Hau-San
    Zhang, Jun
    2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 1484 - 1485
  • [48] Double Selection Based Semi-Supervised Clustering Ensemble for Tumor Clustering from Gene Expression Profiles
    Yu, Zhiwen
    Chen, Hongsheng
    You, Jane
    Wong, Hau-San
    Liu, Jiming
    Li, Le
    Han, Guoqiang
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2014, 11 (04) : 727 - 740
  • [49] A semi-supervised framework for mapping data to the intrinsic manifold
    Gong, HF
    Pan, CH
    Yang, Q
    Lu, HQ
    Ma, SD
    TENTH IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS 1 AND 2, PROCEEDINGS, 2005, : 98 - 105
  • [50] Semi-supervised methods to predict patient survival from gene expression data
    Bair, E
    Tibshirani, R
    PLOS BIOLOGY, 2004, 2 (04) : 511 - 522