Identification of infectious disease-associated host genes using machine learning techniques

被引:17
|
作者
Barman, Ranjan Kumar [1 ,2 ]
Mukhopadhyay, Anirban [3 ]
Maulik, Ujjwal [2 ]
Das, Santasabuj [1 ,4 ]
机构
[1] ICMR Natl Inst Cholera & Enter Dis, Biomed Informat Ctr, Kolkata, W Bengal, India
[2] Jadavpur Univ, Dept Comp Sci & Engn, Kolkata, W Bengal, India
[3] Univ Kalyani, Dept Comp Sci & Engn, Kalyani, W Bengal, India
[4] ICMR Natl Inst Cholera & Enter Dis, Div Clin Med, P-33 CIT Rd Scheme XM, Kolkata 700010, W Bengal, India
关键词
Classification; Deep neural networks; Functional annotations; Infectious disease-associated host genes; Sequence and interaction network features; INTEGRATED APPROACH; INFORMATION; PRIORITIZATION; REPRESENTATION; NETWORK;
D O I
10.1186/s12859-019-3317-0
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: With the global spread of multidrug resistance in pathogenic microbes, infectious diseases emerge as a key public health concern of the recent time. Identification of host genes associated with infectious diseases will improve our understanding about the mechanisms behind their development and help to identify novel therapeutic targets. Results: We developed a machine learning techniques-based classification approach to identify infectious disease-associated host genes by integrating sequence and protein interaction network features. Among different methods, Deep Neural Networks (DNN) model with 16 selected features for pseudo-amino acid composition (PAAC) and network properties achieved the highest accuracy of 86.33% with sensitivity of 85.61% and specificity of 86.57%. The DNN classifier also attained an accuracy of 83.33% on a blind dataset and a sensitivity of 83.1% on an independent dataset. Furthermore, to predict unknown infectious disease-associated host genes, we applied the proposed DNN model to all reviewed proteins from the database. Seventy-six out of 100 highly-predicted infectious disease-associated genes from our study were also found in experimentally-verified human-pathogen protein-protein interactions (PPIs). Finally, we validated the highly-predicted infectious disease-associated genes by disease and gene ontology enrichment analysis and found that many of them are shared by one or more of the other diseases, such as cancer, metabolic and immune related diseases. Conclusions: To the best of our knowledge, this is the first computational method to identify infectious disease-associated host genes. The proposed method will help large-scale prediction of host genes associated with infectious-diseases. However, our results indicated that for small datasets, advanced DNN-based method does not offer significant advantage over the simpler supervised machine learning techniques, such as Support Vector Machine (SVM) or Random Forest (RF) for the prediction of infectious disease-associated host genes. Significant overlap of infectious disease with cancer and metabolic disease on disease and gene ontology enrichment analysis suggests that these diseases perturb the functions of the same cellular signaling pathways and may be treated by drugs that tend to reverse these perturbations. Moreover, identification of novel candidate genes associated with infectious diseases would help us to explain disease pathogenesis further and develop novel therapeutics.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Identification of infectious disease-associated host genes using machine learning techniques
    Ranjan Kumar Barman
    Anirban Mukhopadhyay
    Ujjwal Maulik
    Santasabuj Das
    [J]. BMC Bioinformatics, 20
  • [2] Identification of disease-associated loci using machine learning for genotype and network data integration
    Leal, Luis G.
    David, Alessia
    Jarvelin, Marjo-Riita
    Sebert, Sylvain
    Mannikko, Minna
    Karhunen, Ville
    Seaby, Eleanor
    Hoggart, Clive
    Sternberg, Michael J. E.
    [J]. BIOINFORMATICS, 2019, 35 (24) : 5182 - 5190
  • [3] RNA interference for the identification of disease-associated genes
    Nencioni, A
    Sandy, P
    Dillon, C
    Kissler, S
    Blume-Jensen, P
    Van Parijs, L
    [J]. CURRENT OPINION IN MOLECULAR THERAPEUTICS, 2004, 6 (02) : 136 - 140
  • [4] Infectious disease-associated encephalopathies
    Barbosa-Silva, Maria C.
    Lima, Maiara N.
    Battaglini, Denise
    Robba, Chiara
    Pelosi, Paolo
    Rocco, Patricia R. M.
    Maron-Gutierrez, Tatiana
    [J]. CRITICAL CARE, 2021, 25 (01)
  • [5] Grape Leaf Disease Identification using Machine Learning Techniques
    Jaisakthi, S. M.
    Mirunalini, P.
    Thenmozhi, D.
    Vatsala
    [J]. 2019 SECOND INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN DATA SCIENCE (ICCIDS 2019), 2019,
  • [6] Infectious disease-associated encephalopathies
    Maria C. Barbosa-Silva
    Maiara N. Lima
    Denise Battaglini
    Chiara Robba
    Paolo Pelosi
    Patricia R. M. Rocco
    Tatiana Maron-Gutierrez
    [J]. Critical Care, 25
  • [7] Identification of the disease-associated genes in periodontitis using the co-expression network
    G. P. Sun
    T. Jiang
    P. F. Xie
    J. Lan
    [J]. Molecular Biology, 2016, 50 : 124 - 131
  • [8] Identification of the Disease-Associated Genes in Periodontitis Using the Co-expression Network
    Sun, G. P.
    Jiang, T.
    Xie, P. F.
    Lan, J.
    [J]. MOLECULAR BIOLOGY, 2016, 50 (01) : 124 - 131
  • [9] Identification of Disease-Associated Genes Based on Differential Intron Retention
    Wu, Zhenpeng
    Zheng, Jiantao
    Li, Hong-Dong
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 228 - 231
  • [10] Paraphrase Identification using Machine Learning Techniques
    Chitra, A.
    Kumar, C. S. Saravana
    [J]. RECENT ADVANCES IN NETWORKING, VLSI AND SIGNAL PROCESSING, 2010, : 245 - +