Sequence-based information-theoretic features for gene essentiality prediction

被引:30
|
作者
Nigatu, Dawit [1 ]
Sobetzko, Patrick [2 ]
Yousef, Malik [3 ]
Henkel, Werner [1 ]
机构
[1] Jacobs Univ Bremen, Transmiss Syst Grp, Campus Ring 1, D-28759 Bremen, Germany
[2] Philipps Univ Marburg, LOEWE Zentrum Synthet Mikrobiol, Hans Meerwein Str, D-35043 Marburg, Germany
[3] Zefat Acad Coll, Community Informat Syst, IL-13206 Safed, Israel
来源
BMC BIOINFORMATICS | 2017年 / 18卷
关键词
Essential genes; Random Forest; Information-theoretic features; Machine learning; MARKOV-CHAIN; DRUG TARGETS; ORDER; IDENTIFICATION; MUTAGENESIS; SET;
D O I
10.1186/s12859-017-1884-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Identification of essential genes is not only useful for our understanding of the minimal gene set required for cellular life but also aids the identification of novel drug targets in pathogens. In this work, we present a simple and effective gene essentiality prediction method using information-theoretic features that are derived exclusively from the gene sequences. Results: We developed a Random Forest classifier and performed an extensive model performance evaluation among and within 15 selected bacteria. In intra-organism predictions, where training and testing sets are taken from the same organism, AUC (Area Under the Curve) scores ranging from 0.73 to 0.90, 0.84 on average, were obtained. Cross-organism predictions using 5-fold cross-validation, pairwise, leave-one-species-out, leave-one-taxon-out, and cross-taxon yielded average AUC scores of 0.88, 0.75, 0.80, 0.82, and 0.78, respectively. To further show the applicability of our method in other domains of life, we predicted the essential genes of the yeast Schizosaccharomyces pombe and obtained a similar accuracy (AUC 0.84). Conclusions: The proposed method enables a simple and reliable identification of essential genes without searching in databases for orthologs and demanding further experimental data such as network topology and gene-expression.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Sequence-based information-theoretic features for gene essentiality prediction
    Dawit Nigatu
    Patrick Sobetzko
    Malik Yousef
    Werner Henkel
    [J]. BMC Bioinformatics, 18
  • [2] Prediction of prokaryotic and eukaryotic promoters based on information-theoretic features
    Liu, Xiao
    Teng, Li
    Luo, Yachuan
    Xu, Yuqiao
    [J]. BIOSYSTEMS, 2023, 231
  • [3] An Information-Theoretic Quantification of Discrimination with Exempt Features
    Dutta, Sanghamitra
    Venkatesh, Praveen
    Mardziel, Piotr
    Datta, Anupam
    Grover, Pulkit
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 3825 - 3833
  • [4] Gene essentiality prediction based on fractal features and machine learning
    Yu, Yongming
    Yang, Licai
    Liu, Zhiping
    Zhu, Chuansheng
    [J]. MOLECULAR BIOSYSTEMS, 2017, 13 (03) : 577 - 584
  • [5] Information-theoretic signatures of biodiversity in the barcoding gene
    Barbosa, Valmir C.
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 2018, 451 : 111 - 116
  • [6] Sufficiently Informative and Relevant Features: An Information-Theoretic and Fourier-Based Characterization
    Heidari, Mohsen
    Sreedharan, Jithin K.
    Shamir, Gil
    Szpankowski, Wojciech
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2022, 68 (09) : 6063 - 6077
  • [7] An information-theoretic model for link prediction in complex networks
    Boyao Zhu
    Yongxiang Xia
    [J]. Scientific Reports, 5
  • [8] An information-theoretic model for link prediction in complex networks
    Zhu, Boyao
    Xia, Yongxiang
    [J]. SCIENTIFIC REPORTS, 2015, 5
  • [9] An Information-Theoretic Approach to the Prediction of Protein Structural Class
    Zheng, Xiaoqi
    Li, Chun
    Wang, Jun
    [J]. JOURNAL OF COMPUTATIONAL CHEMISTRY, 2010, 31 (06) : 1201 - 1206
  • [10] LINEAR PREDICTION, FILTERING, AND SMOOTHING - INFORMATION-THEORETIC APPROACH
    KALATA, P
    PRIEMER, R
    [J]. INFORMATION SCIENCES, 1979, 17 (01) : 1 - 14