Essential gene prediction in Drosophila melanogaster using machine learning approaches based on sequence and functional features

被引:24
|
作者
Aromolaran, Olufemi [1 ,2 ,3 ]
Beder, Thomas [2 ]
Oswald, Marcus [2 ]
Oyelade, Jelili [1 ,2 ,3 ]
Adebiyi, Ezekiel [1 ,3 ]
Koenig, Rainer [2 ]
机构
[1] Covenant Univ, Dept Comp & Informat Sci, Ota, Ogun State, Nigeria
[2] Jena Univ Hosp, Integrated Res & Treatment Ctr, Ctr Sepsis Control & Care CSCC, Klinikum 1, D-07747 Jena, Germany
[3] Covenant Univ, Covenant Univ Bioinformat Res CUBRe, Ota, Ogun State, Nigeria
关键词
Machine-learning; Essential genes; Lethal; Drosophila; Essentiality prediction; Homo sapiens; WEB SERVER; PROTEIN; DATABASE; UPDATE;
D O I
10.1016/j.csbj.2020.02.022
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Genes are termed to be essential if their loss of function compromises viability or results in profound loss of fitness. On the genome scale, these genes can be determined experimentally employing RNAi or knockout screens, but this is very resource intensive. Computational methods for essential gene prediction can overcome this drawback, particularly when intrinsic (e.g. from the protein sequence) as well as extrinsic features (e.g. from transcription profiles) are considered. In this work, we employed machine learning to predict essential genes in Drosophila melanogaster. A total of 27,340 features were generated based on a large variety of different aspects comprising nucleotide and protein sequences, gene networks, protein-protein interactions, evolutionary conservation and functional annotations. Employing cross-validation, we obtained an excellent prediction performance. The best model achieved in D. melanogaster a ROC-AUC of 0.90, a PR-AUC of 0.30 and a F1 score of 0.34. Our approach considerably outperformed a benchmark method in which only features derived from the protein sequences were used (P < 0.001). Investigating which features contributed to this success, we found all categories of features, most prominently network topological, functional and sequence-based features. To evaluate our approach we performed the same workflow for essential gene prediction in human and achieved an ROC-AUC = 0.97, PR-AUC = 0.73, and F1 = 0.64. In summary, this study shows that using our well-elaborated assembly of features covering a broad range of intrinsic and extrinsic gene and protein features enabled intelligent systems to predict well the essentiality of genes in an organism. (C) 2020 The Authors. Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology.
引用
收藏
页码:612 / 621
页数:10
相关论文
共 50 条
  • [1] An Evaluation of Machine Learning Approaches for the Prediction of Essential Genes in Eukaryotes Using Protein Sequence-Derived Features
    Campos, Tulio L.
    Korhonen, Pasi K.
    Gasser, Robin B.
    Young, Neil D.
    [J]. COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2019, 17 : 785 - 796
  • [2] Machine learning-based approaches for disease gene prediction
    Duc-Hau Le
    [J]. BRIEFINGS IN FUNCTIONAL GENOMICS, 2020, 19 (5-6) : 350 - 363
  • [3] Prediction of Essential Genes based on Machine Learning and Information Theoretic Features
    Nigatu, Dawit
    Henkel, Werner
    [J]. PROCEEDINGS OF THE 10TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, VOL 3: BIOINFORMATICS, 2017, : 81 - 92
  • [4] Gene essentiality prediction based on fractal features and machine learning
    Yu, Yongming
    Yang, Licai
    Liu, Zhiping
    Zhu, Chuansheng
    [J]. MOLECULAR BIOSYSTEMS, 2017, 13 (03) : 577 - 584
  • [5] Integrating network, sequence and functional features using machine learning approaches towards identification of novel Alzheimer genes
    Jamal, Salma
    Goyal, Sukriti
    Shanker, Asheesh
    Grover, Abhinav
    [J]. BMC GENOMICS, 2016, 17
  • [6] A machine learning framework for the prediction of chromatin folding in Drosophila using epigenetic features
    Rozenwald, Michal B.
    Galitsyna, Aleksandra A.
    Sapunov, Grigory, V
    Khrameeva, Ekaterina E.
    Gelfand, Mikhail S.
    [J]. PEERJ COMPUTER SCIENCE, 2020,
  • [7] Integrating network, sequence and functional features using machine learning approaches towards identification of novel Alzheimer genes
    Salma Jamal
    Sukriti Goyal
    Asheesh Shanker
    Abhinav Grover
    [J]. BMC Genomics, 17
  • [8] A machine learning framework for the prediction of chromatin folding in Drosophila using epigenetic features
    Rozenwald, Michal B.
    Galitsyna, Aleksandra A.
    Sapunov, Grigory V.
    Khrameeva, Ekaterina E.
    Gelfand, Mikhail S.
    [J]. PeerJ Computer Science, 2020, 6 : 2 - 21
  • [9] Fall risk prediction using temporal gait features and machine learning approaches
    Lim, Zhe Khae
    Connie, Tee
    Goh, Michael Kah Ong
    Saedon, Nor 'Izzati Binti
    [J]. FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2024, 7
  • [10] Prediction of Bacterial sRNAs Using Sequence-Derived Features and Machine Learning
    Jha, Tony
    Mendel, Jovinna
    Cho, Hyuk
    Choudhary, Madhusudan
    [J]. BIOINFORMATICS AND BIOLOGY INSIGHTS, 2022, 16