Sequence-based information-theoretic features for gene essentiality prediction

被引:30
|
作者
Nigatu, Dawit [1 ]
Sobetzko, Patrick [2 ]
Yousef, Malik [3 ]
Henkel, Werner [1 ]
机构
[1] Jacobs Univ Bremen, Transmiss Syst Grp, Campus Ring 1, D-28759 Bremen, Germany
[2] Philipps Univ Marburg, LOEWE Zentrum Synthet Mikrobiol, Hans Meerwein Str, D-35043 Marburg, Germany
[3] Zefat Acad Coll, Community Informat Syst, IL-13206 Safed, Israel
来源
BMC BIOINFORMATICS | 2017年 / 18卷
关键词
Essential genes; Random Forest; Information-theoretic features; Machine learning; MARKOV-CHAIN; DRUG TARGETS; ORDER; IDENTIFICATION; MUTAGENESIS; SET;
D O I
10.1186/s12859-017-1884-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Identification of essential genes is not only useful for our understanding of the minimal gene set required for cellular life but also aids the identification of novel drug targets in pathogens. In this work, we present a simple and effective gene essentiality prediction method using information-theoretic features that are derived exclusively from the gene sequences. Results: We developed a Random Forest classifier and performed an extensive model performance evaluation among and within 15 selected bacteria. In intra-organism predictions, where training and testing sets are taken from the same organism, AUC (Area Under the Curve) scores ranging from 0.73 to 0.90, 0.84 on average, were obtained. Cross-organism predictions using 5-fold cross-validation, pairwise, leave-one-species-out, leave-one-taxon-out, and cross-taxon yielded average AUC scores of 0.88, 0.75, 0.80, 0.82, and 0.78, respectively. To further show the applicability of our method in other domains of life, we predicted the essential genes of the yeast Schizosaccharomyces pombe and obtained a similar accuracy (AUC 0.84). Conclusions: The proposed method enables a simple and reliable identification of essential genes without searching in databases for orthologs and demanding further experimental data such as network topology and gene-expression.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Sequence-Based Prediction of Protein Solubility
    Agostini, Federico
    Vendruscolo, Michele
    Tartaglia, Gian Gaetano
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2012, 421 (2-3) : 237 - 241
  • [32] Sequence-based prediction of pathological mutations
    Ferrer-Costa, C
    Orozco, M
    de la Cruz, X
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 57 (04) : 811 - 819
  • [33] On a model of sensoric agents based on information-theoretic approach
    Fen Dexiong
    [J]. PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON INNOVATION & MANAGEMENT, VOLS I AND II, 2007, : 2594 - 2598
  • [34] Information-theoretic exploration for texture-based visualization
    Daying Lu
    [J]. Journal of Visualization, 2017, 20 : 393 - 404
  • [35] Sequence-based prediction of variants’ effects
    Nicole Rusk
    [J]. Nature Methods, 2018, 15 : 571 - 571
  • [36] Information-theoretic Bell inequalities based on Tsallis entropy
    Wajs, Marek
    KurzyNski, Pawel
    Kaszlikowski, Dagomir
    [J]. PHYSICAL REVIEW A, 2015, 91 (01):
  • [37] Classifier Independent Subbands Selection based on Information-Theoretic
    Alim, Affan
    Naseem, Imran
    [J]. 2018 8TH IEEE INTERNATIONAL CONFERENCE ON CONTROL SYSTEM, COMPUTING AND ENGINEERING (ICCSCE 2018), 2018, : 121 - 126
  • [38] Cognitive radio sensing information-theoretic criteria based
    Haddad, Majed
    Hayar, Aawatif Menouni
    Fetoui, Mohamed Hedi
    Debbah, Merouane
    [J]. 2007 2ND INTERNATIONAL CONFERENCE ON COGNITIVE RADIO ORIENTED WIRELESS NETWORKS AND COMMUNICATIONS, 2007, : 241 - 244
  • [39] Particle filter based information-theoretic active sensing
    Ryan, Allison
    Hedrick, J. Karl
    [J]. ROBOTICS AND AUTONOMOUS SYSTEMS, 2010, 58 (05) : 574 - 584
  • [40] GENERIC BOUNDS ON THE MAXIMUM DEVIATIONS IN SEQUENTIAL PREDICTION: AN INFORMATION-THEORETIC ANALYSIS
    Fang, Song
    Zhu, Quanyan
    [J]. 2019 IEEE 29TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2019,