A joint optimization framework integrated with biological knowledge for clustering incomplete gene expression data

被引:2
|
作者
Li, Dan [1 ]
Gu, Hong [1 ]
Chang, Qiaozhen [1 ]
Wang, Jia [2 ]
Qin, Pan [1 ]
机构
[1] Dalian Univ Technol, Fac Elect Informat & Elect Engn, Dalian 116024, Peoples R China
[2] Dalian Med Univ, Dept Breast Surg, Hosp 2, Dalian 116023, Peoples R China
关键词
Gene clustering; Joint optimization; Multi-objective clustering; Imputation; Gene ontology; MISSING VALUE ESTIMATION; VALIDITY MEASURE; ALGORITHM; REPRODUCIBILITY; IMPUTATION; SELECTION; SEARCH;
D O I
10.1007/s00500-022-07180-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering algorithms have been successfully applied to identify co-expressed gene groups from gene expression data. Missing values often occur in gene expression data, which presents a challenge for gene clustering. When partitioning incomplete gene expression data into co-expressed gene groups, missing value imputation and clustering are generally performed as two separate processes. These two-stage methods are likely to result in unsuitable imputation values for clustering task and unsatisfying clustering performance. This paper proposes a multi-objective joint optimization framework for clustering incomplete gene expression data that addresses this problem. The proposed framework can impute the missing expression values under the guidance of clustering, and therefore realize the synergistic improvement of imputation and clustering. In addition, gene expression similarity and gene semantic similarity extracted from the Gene Ontology are combined, as the form of functional neighbor interval for each missing expression value, to provide reasonable constraints for the joint optimization framework. The experiments are carried out on several benchmark data sets. In terms of the average improvement rate over the data sets and different missing rates, our framework can reduce the imputation error by 6.4-14.7% and increase the clustering accuracy by 4.0-10.1% compared with six popular and promising methods. Furthermore, biological significance of the identified gene clusters is reported to evaluate the effectiveness of the proposed framework.
引用
收藏
页码:13639 / 13656
页数:18
相关论文
共 50 条
  • [31] Gene Selection using Biological Knowledge and Fuzzy Clustering
    Ghosh, Sampreeti
    Mitra, Sushmita
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2012,
  • [32] Clustering gene expression series with prior knowledge
    Bréhélin, L
    [J]. ALGORITHMS IN BIOINFORMATICS, PROCEEDINGS, 2005, 3692 : 27 - 38
  • [33] A New Computational Framework for Gene Expression Clustering
    Kasim, Shahreen
    Deris, Safaai
    Othman, Razib M.
    [J]. ADVANCED DATA MINING AND APPLICATIONS, ADMA 2010, PT I, 2010, 6440 : 603 - 610
  • [34] Clustering of gene expression data with Quantum-behaved Particle Swarm Optimization
    Chen, Wei
    Sun, Jun
    Ding, Yanrui
    Fang, Wei
    Xu, Wenbo
    [J]. NEW FRONTIERS IN APPLIED ARTIFICIAL INTELLIGENCE, 2008, 5027 : 388 - 396
  • [35] A Unified Tensor Framework for Clustering and Simultaneous Reconstruction of Incomplete Imaging Data
    Francis, Jobin
    Baburaj, M.
    George, Sudhish N.
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2020, 16 (03)
  • [36] An Incremental Clustering of Gene Expression data
    Das, Rosy
    Bhattacharyya, Dhruba K.
    Kalita, Jugal K.
    [J]. 2009 WORLD CONGRESS ON NATURE & BIOLOGICALLY INSPIRED COMPUTING (NABIC 2009), 2009, : 741 - +
  • [37] Clustering analysis for gene expression data
    Chen, YD
    Ermolaeva, O
    Bittner, M
    Meltzer, P
    Trent, J
    Dougherty, ER
    Batman, S
    [J]. ADVANCES IN FLUORESCENCE SENSING TECHNOLOGY IV, PROCEEDINGS OF, 1999, 3602 : 422 - 428
  • [38] Techniques for clustering gene expression data
    Kerr, G.
    Ruskin, H. J.
    Crane, M.
    Doolan, P.
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2008, 38 (03) : 283 - 293
  • [39] Fuzzy clustering of gene expression data
    Futschik, ME
    Kasabov, NK
    [J]. PROCEEDINGS OF THE 2002 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOL 1 & 2, 2002, : 414 - 419
  • [40] Validating clustering for gene expression data
    Yeung, KY
    Haynor, DR
    Ruzzo, WL
    [J]. BIOINFORMATICS, 2001, 17 (04) : 309 - 318