iBBiG: iterative binary bi-clustering of gene sets

被引:35
|
作者
Gusenleitner, Daniel [1 ]
Howe, Eleanor A. [1 ,2 ]
Bentink, Stefan [1 ,3 ]
Quackenbush, John [1 ,3 ,4 ]
Culhane, Aedin C. [1 ,3 ]
机构
[1] Dana Farber Canc Inst, Dept Biostat & Computat Biol, Boston, MA 02115 USA
[2] Univ Oxford, Dept Stat, Oxford OX1 3TG, England
[3] Harvard Univ, Sch Publ Hlth, Dept Biostat, Boston, MA 02115 USA
[4] Dana Farber Canc Inst, Dept Canc Biol, Boston, MA 02115 USA
基金
美国国家卫生研究院;
关键词
ENRICHMENT ANALYSIS; BIOLOGICAL PROCESSES; MICROARRAY DATA; EXPRESSION DATA; DISEASES; CCL5;
D O I
10.1093/bioinformatics/bts438
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Meta-analysis of genomics data seeks to identify genes associated with a biological phenotype across multiple datasets; however, merging data from different platforms by their features (genes) is challenging. Meta-analysis using functionally or biologically characterized gene sets simplifies data integration is biologically intuitive and is seen as having great potential, but is an emerging field with few established statistical methods. Results: We transform gene expression profiles into binary gene set profiles by discretizing results of gene set enrichment analyses and apply a new iterative bi-clustering algorithm (iBBiG) to identify groups of gene sets that are coordinately associated with groups of phenotypes across multiple studies. iBBiG is optimized for meta-analysis of large numbers of diverse genomics data that may have unmatched samples. It does not require prior knowledge of the number or size of clusters. When applied to simulated data, it outperforms commonly used clustering methods, discovers overlapping clusters of diverse sizes and is robust in the presence of noise. We apply it to meta-analysis of breast cancer studies, where iBBiG extracted novel gene set-phenotype association that predicted tumor metastases within tumor subtypes.
引用
收藏
页码:2484 / 2492
页数:9
相关论文
共 50 条
  • [1] A Sequential Gene Expression Data Bi-clustering Method: Clustering and Verification
    Zhang Yanjie
    Hu Zhanyi
    2009 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, VOL I, PROCEEDINGS, 2009, : 591 - +
  • [2] The Application of Bi-clustering and Bayesian Network for Gene Sets Network Construction in Breast Cancer Microarray Data
    Sohrabi, Ahmad
    Saraygord-Afshari, Neda
    Roudbari, Masoud
    MIDDLE EAST JOURNAL OF CANCER, 2022, 13 (04) : 624 - 640
  • [3] On approximate balanced bi-clustering
    Ma, GX
    Peng, JM
    Wei, Y
    COMPUTING AND COMBINATORICS, PROCEEDINGS, 2005, 3595 : 661 - 670
  • [4] Bi-clustering of Gene Expression Data Using Conditional Entropy
    Olomola, Afolabi
    Dua, Sumeet
    PATTERN RECOGNITION IN BIOINFORMATICS, PROCEEDINGS, 2009, 5780 : 244 - 254
  • [5] A Distance-Based Approach for Binary-Categorical Data Bi-Clustering
    Mujiono, Sadikin
    INTERNETWORKING INDONESIA, 2016, 8 (01): : 59 - 64
  • [6] Consensus Algorithm for Bi-clustering Analysis
    Foszner, Pawel
    Labaj, Wojciech
    Polanski, Andrzej
    Staniszewski, Michal
    COMPUTATIONAL SCIENCE, ICCS 2022, PT II, 2022, : 557 - 570
  • [7] Bi-clustering based recommendation system
    Mali, Mahesh
    Mishra, Dhirendra
    Vijayalaxmi, M.
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2024, 45 (04): : 1029 - 1039
  • [8] A bi-clustering framework for categorical data
    Pensa, RG
    Robardet, C
    Boulicaut, JF
    KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2005, 2005, 3721 : 643 - 650
  • [9] Bi-clustering Gene Expression Data Using Co-similarity
    Hussain, Syed Fawad
    ADVANCED DATA MINING AND APPLICATIONS, PT I, 2011, 7120 : 190 - 200
  • [10] A Convex Optimization Framework for Bi-Clustering
    Lim, Shiau Hong
    Chen, Yudong
    Xu, Huan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 1679 - 1688