Gene-set distance analysis (GSDA): a powerful tool for gene-set association analysis

被引:0
|
作者
Cao, Xueyuan [1 ]
Pounds, Stan [2 ]
机构
[1] Univ Tennessee, Hlth Sci Ctr, Dept Acute & Tertiary Care, Memphis, TN 38163 USA
[2] St Jude Childrens Res Hosp, Dept Biostat, 332 N Lauderdale St, Memphis, TN 38105 USA
关键词
Gene profiling; Gene set; Distance correlation; ACUTE MYELOID-LEUKEMIA; FALSE DISCOVERY RATE; FUNCTIONAL CATEGORIES; ENRICHMENT ANALYSIS; EXPRESSION; MICROARRAY;
D O I
10.1186/s12859-021-04110-x
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Identifying sets of related genes (gene sets) that are empirically associated with a treatment or phenotype often yields valuable biological insights. Several methods effectively identify gene sets in which individual genes have simple monotonic relationships with categorical, quantitative, or censored event-time variables. Some distance-based methods, such as distance correlations, may detect complex non-monotone associations of a gene-set with a quantitative variable that elude other methods. However, the distance correlations have yet to be generalized to associate gene-sets with categorical and censored event-time endpoints. Also, there is a need to determine which genes empirically drive the significance of an association of a gene set with an endpoint. Results: We develop gene-set distance analysis (GSDA) by generalizing distance correlations to evaluate the association of a gene set with categorical and censored event-time variables. We also develop a backward elimination procedure to identify a subset of genes that empirically drive significant associations. In simulation studies, GSDA more effectively identified complex non-monotone gene-set associations than did six other published methods. In the analysis of a pediatric acute myeloid leukemia (AML) data set, GSDA was the only method to discover that event-free survival (EFS) was associated with the 56-gene AML pathway gene-set, narrow that result down to 5 genes, and confirm the association of those 5 genes with EFS in a separate validation cohort. These results indicate that GSDA effectively identifies and characterizes complex non-monotonic gene-set associations that are missed by other methods. Conclusion: GSDA is a powerful and flexible method to detect gene-set association with categorical, quantitative, or censored event-time variables, especially to detect complex non-monotonic gene-set associations. Available at https://CRAN.R-project.org/package=GSDA..
引用
收藏
页数:22
相关论文
共 50 条
  • [1] Gene-set distance analysis (GSDA): a powerful tool for gene-set association analysis
    Xueyuan Cao
    Stan Pounds
    BMC Bioinformatics, 22
  • [2] Gene-set analysis and reduction
    Dinu, Irina
    Potter, John D.
    Mueller, Thomas
    Liu, Qi
    Adewale, Adeniyi J.
    Jhangri, Gian S.
    Einecke, Gunilla
    Famulski, Konrad S.
    Halloran, Philip
    Yasui, Yutaka
    BRIEFINGS IN BIOINFORMATICS, 2009, 10 (01) : 24 - 34
  • [3] The statistical properties of gene-set analysis
    de Leeuw, Christiaan A.
    Neale, Benjamin M.
    Heskes, Tom
    Posthuma, Danielle
    NATURE REVIEWS GENETICS, 2016, 17 (06) : 353 - 364
  • [4] The statistical properties of gene-set analysis
    Christiaan A. de Leeuw
    Benjamin M. Neale
    Tom Heskes
    Danielle Posthuma
    Nature Reviews Genetics, 2016, 17 : 353 - 364
  • [5] massiveGST: A Mann-Whitney-Wilcoxon Gene-Set Test Tool That Gives Meaning to Gene-Set Enrichment Analysis
    Cerulo, Luigi
    Pagnotta, Stefano Maria
    ENTROPY, 2022, 24 (05)
  • [6] A Shrinkage Approach to Gene-Set Analysis
    Parks, Daniel C.
    Lin, Xiwu
    Parks, Joshua J.
    Menius, J. Alan
    Lee, Kwan R.
    STATISTICS IN BIOPHARMACEUTICAL RESEARCH, 2011, 3 (04): : 506 - 514
  • [7] GeneSetCluster: a tool for summarizing and integrating gene-set analysis results
    Ewing, Ewoud
    Planell-Picola, Nuria
    Jagodic, Maja
    Gomez-Cabrero, David
    BMC BIOINFORMATICS, 2020, 21 (01)
  • [8] GeneSetCluster: a tool for summarizing and integrating gene-set analysis results
    Ewoud Ewing
    Nuria Planell-Picola
    Maja Jagodic
    David Gomez-Cabrero
    BMC Bioinformatics, 21
  • [9] DOT: Gene-set analysis by combining decorrelated association statistics
    Vsevolozhskaya, Olga A.
    Shi Min
    Hu Fengjiao
    Zaykin, Dmitri V.
    PLOS COMPUTATIONAL BIOLOGY, 2020, 16 (04)
  • [10] Comparative evaluation of gene-set analysis methods
    Qi Liu
    Irina Dinu
    Adeniyi J Adewale
    John D Potter
    Yutaka Yasui
    BMC Bioinformatics, 8