A joint optimization framework integrated with biological knowledge for clustering incomplete gene expression data

被引:2
|
作者
Li, Dan [1 ]
Gu, Hong [1 ]
Chang, Qiaozhen [1 ]
Wang, Jia [2 ]
Qin, Pan [1 ]
机构
[1] Dalian Univ Technol, Fac Elect Informat & Elect Engn, Dalian 116024, Peoples R China
[2] Dalian Med Univ, Dept Breast Surg, Hosp 2, Dalian 116023, Peoples R China
关键词
Gene clustering; Joint optimization; Multi-objective clustering; Imputation; Gene ontology; MISSING VALUE ESTIMATION; VALIDITY MEASURE; ALGORITHM; REPRODUCIBILITY; IMPUTATION; SELECTION; SEARCH;
D O I
10.1007/s00500-022-07180-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering algorithms have been successfully applied to identify co-expressed gene groups from gene expression data. Missing values often occur in gene expression data, which presents a challenge for gene clustering. When partitioning incomplete gene expression data into co-expressed gene groups, missing value imputation and clustering are generally performed as two separate processes. These two-stage methods are likely to result in unsuitable imputation values for clustering task and unsatisfying clustering performance. This paper proposes a multi-objective joint optimization framework for clustering incomplete gene expression data that addresses this problem. The proposed framework can impute the missing expression values under the guidance of clustering, and therefore realize the synergistic improvement of imputation and clustering. In addition, gene expression similarity and gene semantic similarity extracted from the Gene Ontology are combined, as the form of functional neighbor interval for each missing expression value, to provide reasonable constraints for the joint optimization framework. The experiments are carried out on several benchmark data sets. In terms of the average improvement rate over the data sets and different missing rates, our framework can reduce the imputation error by 6.4-14.7% and increase the clustering accuracy by 4.0-10.1% compared with six popular and promising methods. Furthermore, biological significance of the identified gene clusters is reported to evaluate the effectiveness of the proposed framework.
引用
收藏
页码:13639 / 13656
页数:18
相关论文
共 50 条
  • [1] A joint optimization framework integrated with biological knowledge for clustering incomplete gene expression data
    Dan Li
    Hong Gu
    Qiaozhen Chang
    Jia Wang
    Pan Qin
    [J]. Soft Computing, 2023, 27 : 13639 - 13656
  • [2] A new unsupervised gene clustering algorithm based on the integration of biological knowledge into expression data
    Verbanck, Marie
    Le, Sebastien
    Pages, Jerome
    [J]. BMC BIOINFORMATICS, 2013, 14
  • [3] A new unsupervised gene clustering algorithm based on the integration of biological knowledge into expression data
    Marie Verbanck
    Sébastien Lê
    Jérôme Pagès
    [J]. BMC Bioinformatics, 14
  • [4] Semi-supervised clustering for gene-expression data in multiobjective optimization framework
    Alok, Abhay Kumar
    Saha, Sriparna
    Ekbal, Asif
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2017, 8 (02) : 421 - 439
  • [5] Semi-supervised clustering for gene-expression data in multiobjective optimization framework
    Abhay Kumar Alok
    Sriparna Saha
    Asif Ekbal
    [J]. International Journal of Machine Learning and Cybernetics, 2017, 8 : 421 - 439
  • [6] Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data
    Huang, DS
    Pan, W
    [J]. BIOINFORMATICS, 2006, 22 (10) : 1259 - 1268
  • [7] A Bi-Objective model for Gene Clustering Combining Expression Data and External Biological Knowledge
    Parraga-Alava, Jorge
    Inostroza-Ponta, Mario
    [J]. PROCEEDINGS OF THE 2016 XLII LATIN AMERICAN COMPUTING CONFERENCE (CLEI), 2016,
  • [8] Evaluation and optimization of clustering in gene expression data analysis
    Famili, AF
    Liu, GM
    Liu, ZY
    [J]. BIOINFORMATICS, 2004, 20 (10) : 1535 - 1545
  • [9] Simultaneous Feature Selection and Unsupervised Clustering for Gene-Expression Data in Multiobjective Optimization Framework
    Alok, Abhay Kumar
    Kanekar, Neha
    Saha, Sriparna
    Ekbal, Asif
    [J]. 2014 9TH INTERNATIONAL CONFERENCE ON INDUSTRIAL AND INFORMATION SYSTEMS (ICIIS), 2014, : 691 - 696
  • [10] A partial order framework for incomplete data clustering
    Hamdi Yahyaoui
    Hosam AboElfotoh
    Yanjun Shu
    [J]. Applied Intelligence, 2023, 53 : 7439 - 7454