Sequence-based information-theoretic features for gene essentiality prediction

被引:30
|
作者
Nigatu, Dawit [1 ]
Sobetzko, Patrick [2 ]
Yousef, Malik [3 ]
Henkel, Werner [1 ]
机构
[1] Jacobs Univ Bremen, Transmiss Syst Grp, Campus Ring 1, D-28759 Bremen, Germany
[2] Philipps Univ Marburg, LOEWE Zentrum Synthet Mikrobiol, Hans Meerwein Str, D-35043 Marburg, Germany
[3] Zefat Acad Coll, Community Informat Syst, IL-13206 Safed, Israel
来源
BMC BIOINFORMATICS | 2017年 / 18卷
关键词
Essential genes; Random Forest; Information-theoretic features; Machine learning; MARKOV-CHAIN; DRUG TARGETS; ORDER; IDENTIFICATION; MUTAGENESIS; SET;
D O I
10.1186/s12859-017-1884-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Identification of essential genes is not only useful for our understanding of the minimal gene set required for cellular life but also aids the identification of novel drug targets in pathogens. In this work, we present a simple and effective gene essentiality prediction method using information-theoretic features that are derived exclusively from the gene sequences. Results: We developed a Random Forest classifier and performed an extensive model performance evaluation among and within 15 selected bacteria. In intra-organism predictions, where training and testing sets are taken from the same organism, AUC (Area Under the Curve) scores ranging from 0.73 to 0.90, 0.84 on average, were obtained. Cross-organism predictions using 5-fold cross-validation, pairwise, leave-one-species-out, leave-one-taxon-out, and cross-taxon yielded average AUC scores of 0.88, 0.75, 0.80, 0.82, and 0.78, respectively. To further show the applicability of our method in other domains of life, we predicted the essential genes of the yeast Schizosaccharomyces pombe and obtained a similar accuracy (AUC 0.84). Conclusions: The proposed method enables a simple and reliable identification of essential genes without searching in databases for orthologs and demanding further experimental data such as network topology and gene-expression.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Categorization of species based on their microRNAs employing sequence motifs, information-theoretic sequence feature extraction, and k-mers
    Yousef, Malik
    Nigatu, Dawit
    Levy, Dalit
    Allmer, Jens
    Henkel, Werner
    [J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2017,
  • [42] Categorization of species based on their microRNAs employing sequence motifs, information-theoretic sequence feature extraction, and k-mers
    Malik Yousef
    Dawit Nigatu
    Dalit Levy
    Jens Allmer
    Werner Henkel
    [J]. EURASIP Journal on Advances in Signal Processing, 2017
  • [43] Evaluating information-theoretic measures of word prediction in naturalistic sentence reading
    Christoph, Aurnhammer
    Frank, Stefan L.
    [J]. NEUROPSYCHOLOGIA, 2019, 134
  • [44] De novo sequence-based method for ncRPI prediction using structural information
    Leone, Michele
    Galvani, Marta
    Masseroli, Marco
    [J]. 2019 IEEE 19TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2019, : 146 - 151
  • [45] Consistency of Sequence-Based Gene Clusters
    Wittler, Roland
    Manuch, Jan
    Patterson, Murray
    Stoye, Jens
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2011, 18 (09) : 1023 - 1039
  • [46] Consistency of Sequence-Based Gene Clusters
    Wittler, Roland
    Stoye, Jens
    [J]. COMPARATIVE GENOMICS, 2010, 6398 : 252 - 263
  • [47] Information-theoretic selection of high-dimensional spectral features for structural recognition
    Bonev, Boyan
    Escolano, Francisco
    Giorgi, Daniela
    Biasotti, Silvia
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2013, 117 (03) : 214 - 228
  • [48] Information-theoretic environment features selection for occupancy detection in open office spaces
    Zhang, Rui
    Lam, Khee Poh
    Chiou, Yun-Shang
    Dong, Bing
    [J]. BUILDING SIMULATION, 2012, 5 (02) : 179 - 188
  • [49] An information-theoretic graph-based approach for feature selection
    Amit Kumar Das
    Sahil Kumar
    Samyak Jain
    Saptarsi Goswami
    Amlan Chakrabarti
    Basabi Chakraborty
    [J]. Sādhanā, 2020, 45
  • [50] Robust change analysis of SAR data through information-theoretic multitemporal features
    Alparone, Luciano
    Aiazzi, Bruno
    Baronti, Stefano
    Garzelli, Andrea
    Nencini, Filippo
    [J]. IGARSS: 2007 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, VOLS 1-12: SENSING AND UNDERSTANDING OUR PLANET, 2007, : 3883 - +