Sequence-based information-theoretic features for gene essentiality prediction

被引:30
|
作者
Nigatu, Dawit [1 ]
Sobetzko, Patrick [2 ]
Yousef, Malik [3 ]
Henkel, Werner [1 ]
机构
[1] Jacobs Univ Bremen, Transmiss Syst Grp, Campus Ring 1, D-28759 Bremen, Germany
[2] Philipps Univ Marburg, LOEWE Zentrum Synthet Mikrobiol, Hans Meerwein Str, D-35043 Marburg, Germany
[3] Zefat Acad Coll, Community Informat Syst, IL-13206 Safed, Israel
来源
BMC BIOINFORMATICS | 2017年 / 18卷
关键词
Essential genes; Random Forest; Information-theoretic features; Machine learning; MARKOV-CHAIN; DRUG TARGETS; ORDER; IDENTIFICATION; MUTAGENESIS; SET;
D O I
10.1186/s12859-017-1884-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Identification of essential genes is not only useful for our understanding of the minimal gene set required for cellular life but also aids the identification of novel drug targets in pathogens. In this work, we present a simple and effective gene essentiality prediction method using information-theoretic features that are derived exclusively from the gene sequences. Results: We developed a Random Forest classifier and performed an extensive model performance evaluation among and within 15 selected bacteria. In intra-organism predictions, where training and testing sets are taken from the same organism, AUC (Area Under the Curve) scores ranging from 0.73 to 0.90, 0.84 on average, were obtained. Cross-organism predictions using 5-fold cross-validation, pairwise, leave-one-species-out, leave-one-taxon-out, and cross-taxon yielded average AUC scores of 0.88, 0.75, 0.80, 0.82, and 0.78, respectively. To further show the applicability of our method in other domains of life, we predicted the essential genes of the yeast Schizosaccharomyces pombe and obtained a similar accuracy (AUC 0.84). Conclusions: The proposed method enables a simple and reliable identification of essential genes without searching in databases for orthologs and demanding further experimental data such as network topology and gene-expression.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Information-theoretic metrics for visualizing gene-environment interactions
    Chanda, Pritam
    Zhang, Aidong
    Brazeau, Daniel
    Sucheston, Lara
    Freudenheim, Jo L.
    Ambrosone, Christine
    Ramanathan, Murali
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 81 (05) : 939 - 963
  • [22] Reconstruction of Gene Network through Backward Elimination Based Information-Theoretic Inference with Maximal Information Coefficient
    Paul, Animesh Kumar
    Shill, Pintu Chandra
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON IMAGING, VISION & PATTERN RECOGNITION (ICIVPR), 2017,
  • [23] Information-theoretic exploration for texture-based visualization
    Lu, Daying
    [J]. JOURNAL OF VISUALIZATION, 2017, 20 (02) : 393 - 404
  • [24] Ranking genomic features using an information-theoretic measure of epigenetic discordance
    Garrett Jenkinson
    Jordi Abante
    Michael A. Koldobskiy
    Andrew P. Feinberg
    John Goutsias
    [J]. BMC Bioinformatics, 20
  • [25] Sequence-based prediction of protein domains
    Liu, JF
    Rost, B
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 (12) : 3522 - 3530
  • [26] Ranking genomic features using an information-theoretic measure of epigenetic discordance
    Jenkinson, Garrett
    Abante, Jordi
    Koldobskiy, Michael A.
    Feinberg, Andrew P.
    Goutsias, John
    [J]. BMC BIOINFORMATICS, 2019, 20 (1)
  • [27] Predicting essential genes of 37 prokaryotes by combining information-theoretic features
    Liu, Xiao
    Luo, Yachuan
    He, Ting
    Ren, Meixiang
    Xu, Yuqiao
    [J]. JOURNAL OF MICROBIOLOGICAL METHODS, 2021, 188
  • [28] Code Consistent Hashing Based on Information-Theoretic Criterion
    Zhang, Shu
    Liang, Jian
    He, Ran
    Sun, Zhenan
    [J]. IEEE Transactions on Big Data, 2015, 1 (03): : 84 - 94
  • [29] Automatic Readability Classification of Crowd-Sourced Data based on Linguistic and Information-Theoretic Features
    Islam, Zahurul
    Mehler, Alexander
    [J]. COMPUTACION Y SISTEMAS, 2013, 17 (02): : 113 - 123
  • [30] Sequence-based prediction of variants' effects
    Rusk, Nicole
    [J]. NATURE METHODS, 2018, 15 (07) : 571 - 571