Semi-supervised clustering for gene-expression data in multiobjective optimization framework

被引:25
|
作者
Alok, Abhay Kumar [1 ]
Saha, Sriparna [1 ]
Ekbal, Asif [1 ]
机构
[1] Indian Inst Technol, Comp Sci Engn, Patna, Bihar, India
关键词
Gene expression data clustering; Semi-supervised classification; Multiobjective optimization; Cluster validity index; AMOSA; TRANSCRIPTIONAL PROGRAM; OLIGONUCLEOTIDE ARRAYS; COEXPRESSED GENES; ALGORITHM; MICROARRAY; PATTERNS; CLASSIFICATION; INDEXES;
D O I
10.1007/s13042-015-0335-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Studying the patterns hidden in gene expression data helps to understand the functionality of genes. But due to the large volume of genes and the complexity of biological networks it is difficult to study the resulting mass of data which often consists of millions of measurements. In order to reveal natural structures and to identify interesting patterns from the given gene expression data set, clustering techniques are applied. Semi-supervised classification is a new direction of machine learning. It requires huge unlabeled data and a few labeled data. Semi-supervised classification in general performs better than unsupervised classification. But to the best of our knowledge there are no works for solving gene expression data clustering problem using semi-supervised classification techniques. In the current paper we have made an attempt to solve the gene expression data clustering problem using a multiobjective optimization based semi-supervised classification technique with the aim to attain good quality partitions by using few labeled data. In order to generate the labeled data, initially Fuzzy C-means clustering technique is applied. In order to automatically determine the partitioning, multiple cluster centers corresponding to a cluster are encoded in the form of a string. In order to compute the quality of the obtained partitioning, values of five objective functions are computed. The effectiveness of this proposed semi-supervised clustering technique is demonstrated on five publicly available benchmark gene expression data sets. Comparison results with the existing techniques for gene expression data clustering prove that the proposed method is the most effective one. Statistical and biological significance tests have also been carried out.
引用
收藏
页码:421 / 439
页数:19
相关论文
共 50 条
  • [31] Active Semi-supervised Framework with Data Editing
    Zhang, Xue
    Xiao, Wangxin
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2012, 9 (04) : 1513 - 1532
  • [32] Clinically driven semi-supervised class discovery in gene expression data
    Steinfeld, Israel
    Navon, Roy
    Ardigo, Diego
    Zavaroni, Ivana
    Yakhini, Zohar
    BIOINFORMATICS, 2008, 24 (16) : I90 - I97
  • [33] Semi-Supervised Clustering for Sparsely Sampled Longitudinal Data
    Takagishi, Mariko
    Yadohisa, Hiroshi
    COMPLEX ADAPTIVE SYSTEMS, 2015, 2015, 61 : 18 - 23
  • [34] A Semi-supervised Three-Way Clustering Framework for Multi-view Data
    Yu, Hong
    Wang, Xincheng
    Wang, Guoyin
    ROUGH SETS, IJCRS 2017, PT II, 2017, 10314 : 313 - 325
  • [35] A semi-supervised clustering approach using labeled data
    Taghizabet, A.
    Tanha, J.
    Amini, A.
    Mohammadzadeh, J.
    SCIENTIA IRANICA, 2023, 30 (01) : 104 - 115
  • [36] A Semi-supervised Fuzzy Co-clustering Framework and Application to Twitter Data Analysis
    Honda, Katsuhiro
    Ubukata, Seiki
    Notsu, Akira
    Takahashi, Norimitsu
    Ishikawa, Yutaka
    2015 4TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION ICIEV 15, 2015,
  • [37] A Fast Semi-Supervised Clustering Framework for Large-Scale Time Series Data
    He, Guoliang
    Pan, Yanzhou
    Xia, Xuewen
    He, Jinrong
    Peng, Rong
    Xiong, Neal N.
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2021, 51 (07): : 4201 - 4216
  • [38] Hierarchical Text Clustering and Categorisation using A Semi-Supervised Framework
    Mahyoub, Mohamed
    Hind, Jade
    Woods, David
    Wong, Carl
    Hussain, Abir
    Aljumeily, Dhiya
    12TH INTERNATIONAL CONFERENCE ON THE DEVELOPMENTS IN ESYSTEMS ENGINEERING (DESE 2019), 2019, : 153 - 159
  • [39] Combining Semi-supervised Clustering and Classification Under a Generalized Framework
    Jiang, Zhen
    Zhao, Lingyun
    Lu, Yu
    JOURNAL OF CLASSIFICATION, 2025, 42 (01) : 181 - 204
  • [40] A Clustering Framework for Unsupervised and Semi-Supervised New Intent Discovery
    Zhang, Hanlei
    Xu, Hua
    Wang, Xin
    Long, Fei
    Gao, Kai
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (11) : 5468 - 5481