Metric learning on expression data for gene function prediction

被引:12
|
作者
Makrodimitris, Stavros [1 ,2 ]
Reinders, Marcel J. T. [1 ,3 ]
van Ham, Roeland C. H. J. [1 ,2 ]
机构
[1] Delft Univ Technol, Delft Bioinformat Lab, NL-2628 XE Delft, Netherlands
[2] Keygene NV, NL-6708 PW Wageningen, Netherlands
[3] Leiden Univ, Leiden Computat Biol Ctr, Med Ctr, NL-2333 ZC Leiden, Netherlands
关键词
REGRESSION; ALGORITHM; SELECTION; ENSEMBLE;
D O I
10.1093/bioinformatics/btz731
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Co-expression of two genes across different conditions is indicative of their involvement in the same biological process. However, when using RNA-Seq datasets with many experimental conditions from diverse sources, only a subset of the experimental conditions is expected to be relevant for finding genes related to a particular Gene Ontology (GO) term. Therefore, we hypothesize that when the purpose is to find similarly functioning genes, the co-expression of genes should not be determined on all samples but only on those samples informative for the GO term of interest. Results: To address this, we developed Metric Learning for Co-expression (MLC), a fast algorithm that assigns a GO-term-specific weight to each expression sample. The goal is to obtain a weighted co-expression measure that is more suitable than the unweighted Pearson correlation for applying Guilt-By-Association-based function predictions. More specifically, if two genes are annotated with a given GO term, MLC tries to maximize their weighted co-expression and, in addition, if one of them is not annotated with that term, the weighted co-expression is minimized. Our experiments on publicly available Arabidopsis thaliana RNA-Seq data demonstrate that MLC outperforms standard Pearson correlation in term-centric performance. Moreover, our method is particularly good at more specific terms, which are the most interesting. Finally, by observing the sample weights for a particular GO term, one can identify which experiments are important for learning that term and potentially identify novel conditions that are relevant, as demonstrated by experiments in both A. thaliana and Pseudomonas Aeruginosa.
引用
收藏
页码:1182 / 1190
页数:9
相关论文
共 50 条
  • [21] Integrating Gene Expression Data Into Genomic Prediction
    Li, Zhengcao
    Gao, Ning
    Martini, Johannes W. R.
    Simianer, Henner
    FRONTIERS IN GENETICS, 2019, 10
  • [22] Systematic gene function prediction from gene expression data by using a fuzzy nearest-cluster method
    Xiao-Li Li
    Yin-Chet Tan
    See-Kiong Ng
    BMC Bioinformatics, 7
  • [23] Systematic gene function prediction from gene expression data by using a fuzzy nearest-cluster method
    Li, Xiao-Li
    Tan, Yin-Chet
    Ng, See-Kiong
    BMC BIOINFORMATICS, 2006, 7 (Suppl 4)
  • [24] Protein function prediction from dynamic protein interaction network using gene expression data
    Saha, Sovan
    Prasad, Abhimanyu
    Chatterjee, Piyali
    Basu, Subhadip
    Nasipuri, Mita
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2019, 17 (04)
  • [25] Prediction of metabolic fluxes from gene expression data with Huber penalty convex optimization function
    Zhang, Shao-Wu
    Gou, Wang-Long
    Li, Yan
    MOLECULAR BIOSYSTEMS, 2017, 13 (05) : 901 - 909
  • [26] Prediction of colorectal cancer chemotherapy efficacy using machine learning applied to gene expression data
    Jafri, Mohsin Saleet
    Amniouel, Soukaina
    JOURNAL OF CLINICAL ONCOLOGY, 2024, 42 (16)
  • [27] Breast cancer prediction based on gene expression data using interpretable machine learning techniques
    Kallah-Dagadu, Gabriel
    Mohammed, Mohanad
    Nasejje, Justine B.
    Mchunu, Nobuhle Nokubonga
    Twabi, Halima S.
    Batidzirai, Jesca Mercy
    Singini, Geoffrey Chiyuzga
    Nevhungoni, Portia
    Maposa, Innocent
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [28] Clustering gene expression data with a penalized graph-based metric
    Baya, Ariel E.
    Granitto, Pablo M.
    BMC BIOINFORMATICS, 2011, 12
  • [29] Editorial: Machine Learning Techniques on Gene Function Prediction
    Zou, Quan
    Sangaiah, Arun Kumar
    Mrozek, Dariusz
    FRONTIERS IN GENETICS, 2019, 10
  • [30] Clustering gene expression data with a penalized graph-based metric
    Ariel E Bayá
    Pablo M Granitto
    BMC Bioinformatics, 12