Essential gene prediction in Drosophila melanogaster using machine learning approaches based on sequence and functional features

被引:24
|
作者
Aromolaran, Olufemi [1 ,2 ,3 ]
Beder, Thomas [2 ]
Oswald, Marcus [2 ]
Oyelade, Jelili [1 ,2 ,3 ]
Adebiyi, Ezekiel [1 ,3 ]
Koenig, Rainer [2 ]
机构
[1] Covenant Univ, Dept Comp & Informat Sci, Ota, Ogun State, Nigeria
[2] Jena Univ Hosp, Integrated Res & Treatment Ctr, Ctr Sepsis Control & Care CSCC, Klinikum 1, D-07747 Jena, Germany
[3] Covenant Univ, Covenant Univ Bioinformat Res CUBRe, Ota, Ogun State, Nigeria
关键词
Machine-learning; Essential genes; Lethal; Drosophila; Essentiality prediction; Homo sapiens; WEB SERVER; PROTEIN; DATABASE; UPDATE;
D O I
10.1016/j.csbj.2020.02.022
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Genes are termed to be essential if their loss of function compromises viability or results in profound loss of fitness. On the genome scale, these genes can be determined experimentally employing RNAi or knockout screens, but this is very resource intensive. Computational methods for essential gene prediction can overcome this drawback, particularly when intrinsic (e.g. from the protein sequence) as well as extrinsic features (e.g. from transcription profiles) are considered. In this work, we employed machine learning to predict essential genes in Drosophila melanogaster. A total of 27,340 features were generated based on a large variety of different aspects comprising nucleotide and protein sequences, gene networks, protein-protein interactions, evolutionary conservation and functional annotations. Employing cross-validation, we obtained an excellent prediction performance. The best model achieved in D. melanogaster a ROC-AUC of 0.90, a PR-AUC of 0.30 and a F1 score of 0.34. Our approach considerably outperformed a benchmark method in which only features derived from the protein sequences were used (P < 0.001). Investigating which features contributed to this success, we found all categories of features, most prominently network topological, functional and sequence-based features. To evaluate our approach we performed the same workflow for essential gene prediction in human and achieved an ROC-AUC = 0.97, PR-AUC = 0.73, and F1 = 0.64. In summary, this study shows that using our well-elaborated assembly of features covering a broad range of intrinsic and extrinsic gene and protein features enabled intelligent systems to predict well the essentiality of genes in an organism. (C) 2020 The Authors. Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology.
引用
收藏
页码:612 / 621
页数:10
相关论文
共 50 条
  • [41] Prediction of Essential Genes in Comparison States Using Machine Learning
    Xie, Jiang
    Zhao, Chang
    Sun, Jiamin
    Li, Jiaxin
    Yang, Fuzhang
    Wang, Jiao
    Nie, Qing
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2021, 18 (05) : 1784 - 1792
  • [42] Machine Learning Approaches for Prediction of Expansin Gene Family in Indica Rice
    Hemalatha N.
    Rajesh M.K.
    Narayanan N.K.
    [J]. Agricultural Research, 2013, 2 (4) : 309 - 318
  • [43] Features in Identification Approaches for MicroRNA Precursors Based on Machine Learning
    Zheng Hongjun
    Pu Haiqing
    Wang Xiuqin
    Li Yongqiang
    [J]. 2014 FIFTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND ENGINEERING APPLICATIONS (ISDEA), 2014, : 483 - 488
  • [44] Evaluation and prediction of slope stability using machine learning approaches
    Lin, Shan
    Zheng, Hong
    Han, Chao
    Han, Bei
    Li, Wei
    [J]. FRONTIERS OF STRUCTURAL AND CIVIL ENGINEERING, 2021, 15 (04) : 821 - 833
  • [45] Prediction of skin sensitization potency using machine learning approaches
    Zang, Qingda
    Paris, Michael
    Lehmann, David M.
    Bell, Shannon
    Kleinstreuer, Nicole
    Allen, David
    Matheson, Joanna
    Jacobs, Abigail
    Casey, Warren
    Strickland, Judy
    [J]. JOURNAL OF APPLIED TOXICOLOGY, 2017, 37 (07) : 792 - 805
  • [46] Disruption Prediction Approaches Using Machine Learning Tools in Tokamaks
    Sias, G.
    Cannas, B.
    Carcangiu, S.
    Fanni, A.
    Murari, A.
    Pau, A.
    [J]. 2019 PHOTONICS & ELECTROMAGNETICS RESEARCH SYMPOSIUM - SPRING (PIERS-SPRING), 2019, : 2880 - 2890
  • [47] Evaluation and prediction of slope stability using machine learning approaches
    Shan LIN
    Hong ZHENG
    Chao HAN
    Bei HAN
    Wei LI
    [J]. Frontiers of Structural and Civil Engineering., 2021, (04) - 833
  • [48] A survey on diabetes risk prediction using machine learning approaches
    Firdous, Shimoo
    Wagai, Gowher A.
    Sharma, Kalpana
    [J]. JOURNAL OF FAMILY MEDICINE AND PRIMARY CARE, 2022, 11 (11) : 6929 - 6934
  • [49] Evaluation and prediction of slope stability using machine learning approaches
    Shan Lin
    Hong Zheng
    Chao Han
    Bei Han
    Wei Li
    [J]. Frontiers of Structural and Civil Engineering, 2021, 15 : 821 - 833
  • [50] Parkinson's Disease Prediction Using Machine Learning Approaches
    Gokul, S.
    Sivachitra, M.
    Vijayachitra, S.
    [J]. 2013 FIFTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2013, : 246 - 252