Metric learning on expression data for gene function prediction

被引:12
|
作者
Makrodimitris, Stavros [1 ,2 ]
Reinders, Marcel J. T. [1 ,3 ]
van Ham, Roeland C. H. J. [1 ,2 ]
机构
[1] Delft Univ Technol, Delft Bioinformat Lab, NL-2628 XE Delft, Netherlands
[2] Keygene NV, NL-6708 PW Wageningen, Netherlands
[3] Leiden Univ, Leiden Computat Biol Ctr, Med Ctr, NL-2333 ZC Leiden, Netherlands
关键词
REGRESSION; ALGORITHM; SELECTION; ENSEMBLE;
D O I
10.1093/bioinformatics/btz731
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Co-expression of two genes across different conditions is indicative of their involvement in the same biological process. However, when using RNA-Seq datasets with many experimental conditions from diverse sources, only a subset of the experimental conditions is expected to be relevant for finding genes related to a particular Gene Ontology (GO) term. Therefore, we hypothesize that when the purpose is to find similarly functioning genes, the co-expression of genes should not be determined on all samples but only on those samples informative for the GO term of interest. Results: To address this, we developed Metric Learning for Co-expression (MLC), a fast algorithm that assigns a GO-term-specific weight to each expression sample. The goal is to obtain a weighted co-expression measure that is more suitable than the unweighted Pearson correlation for applying Guilt-By-Association-based function predictions. More specifically, if two genes are annotated with a given GO term, MLC tries to maximize their weighted co-expression and, in addition, if one of them is not annotated with that term, the weighted co-expression is minimized. Our experiments on publicly available Arabidopsis thaliana RNA-Seq data demonstrate that MLC outperforms standard Pearson correlation in term-centric performance. Moreover, our method is particularly good at more specific terms, which are the most interesting. Finally, by observing the sample weights for a particular GO term, one can identify which experiments are important for learning that term and potentially identify novel conditions that are relevant, as demonstrated by experiments in both A. thaliana and Pseudomonas Aeruginosa.
引用
收藏
页码:1182 / 1190
页数:9
相关论文
共 50 条
  • [1] Protein Expression Data Improves Gene Function Prediction
    Yang, Huadong
    Song, Xiaofeng
    Guo, Xuejiang
    2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, : 1869 - 1870
  • [2] Incremental Fuzzy Mining of Gene Expression Data for Gene Function Prediction
    Ma, Patrick C. H.
    Chan, Keith C. C.
    IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2011, 58 (05) : 1246 - 1252
  • [3] Selecting a classification function for class prediction with gene expression data
    Jong, Victor L.
    Novianti, Putri W.
    Roes, Kit C. B.
    Eijkemans, Marinus J. C.
    BIOINFORMATICS, 2016, 32 (12) : 1814 - 1822
  • [4] Discriminative local subspaces in gene expression data for effective gene function prediction
    Puelma, Tomas
    Gutierrez, Rodrigo A.
    Soto, Alvaro
    BIOINFORMATICS, 2012, 28 (17) : 2256 - 2264
  • [5] Consensus clustering of gene expression data and its application to gene function prediction
    Xiao, Guanghua
    Pan, Wei
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2007, 16 (03) : 733 - 751
  • [6] Mining Fuzzy Association Patterns in Gene Expression Data for Gene Function Prediction
    Ma, Patrick C. H.
    Chan, Keith C. C.
    2008 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, PROCEEDINGS, 2008, : 84 - 89
  • [7] miRNA target identification and prediction as a function of time in gene expression data
    Grigaitis, Pranas
    Starkuviene, Vytaute
    Rost, Ursula
    Serva, Andrius
    Pucholt, Pascal
    Kummer, Ursula
    RNA BIOLOGY, 2020, 17 (07) : 990 - 1000
  • [8] Classification of gene-expression data: The manifold-based metric learning way
    Lee, Jianguo
    Zhang, Changshui
    PATTERN RECOGNITION, 2006, 39 (12) : 2450 - 2463
  • [9] Assessment of deep learning and transfer learning for cancer prediction based on gene expression data
    Hanczar, Blaise
    Bourgeais, Victoria
    Zehraoui, Farida
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [10] Assessment of deep learning and transfer learning for cancer prediction based on gene expression data
    Blaise Hanczar
    Victoria Bourgeais
    Farida Zehraoui
    BMC Bioinformatics, 23