Amyloidogenic motifs revealed by n-gram analysis

被引:49
|
作者
Burdukiewicz, Michal [1 ]
Sobczyk, Piotr [2 ]
Rodiger, Stefan [3 ]
Duda-Madej, Anna [4 ]
Mackiewicz, Pawel [1 ]
Kotulska, Malgorzata [5 ]
机构
[1] Univ Wroclaw, Dept Genom, Wroclaw, Poland
[2] Wroclaw Univ Sci & Technol, Fac Pure & Appl Math, Wroclaw, Poland
[3] Brandenburg Univ Technol Cottbus Senftenberg, Inst Biotechnol, Senftenberg, Germany
[4] Wroclaw Med Univ, Dept Microbiol, Wroclaw, Poland
[5] Wroclaw Univ Sci & Technol, Dept Biomed Engn, Fac Fundamental Problems Technol, Wroclaw, Poland
来源
SCIENTIFIC REPORTS | 2017年 / 7卷
关键词
PREDICTION; DATABASE;
D O I
10.1038/s41598-017-13210-9
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Amyloids are proteins associated with several clinical disorders, including Alzheimer's, and Creutzfeldt-Jakob's. Despite their diversity, all amyloid proteins can undergo aggregation initiated by short segments called hot spots. To find the patterns defining the hot spots, we trained predictors of amyloidogenicity, using n-grams and random forest classifiers. Since the amyloidogenicity may not depend on the exact sequence of amino acids but on their more general properties, we tested 524,284 reduced amino acid alphabets of different lengths (three to six letters) to find the alphabet providing the best performance in cross-validation. The predictor based on this alphabet, called AmyloGram, was benchmarked against the most popular tools for the detection of amyloid peptides using an external data set and obtained the highest values of performance measures (AUC: 0.90, MCC: 0.63). Our results showed sequential patterns in the amyloids which are strongly correlated with hydrophobicity, a tendency to form beta-sheets, and lower flexibility of amino acid residues. Among the most informative n-grams of AmyloGram we identified 15 that were previously confirmed experimentally. AmyloGram is available as the web-server: http://smorfland.uni.wroc.pl/shiny/AmyloGram/ and as the R package AmyloGram. R scripts and data used to produce the results of this manuscript are available at http://github.com/michbur/AmyloGramAnalysis.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Amyloidogenic motifs revealed by n-gram analysis
    Michał Burdukiewicz
    Piotr Sobczyk
    Stefan Rödiger
    Anna Duda-Madej
    Paweł Mackiewicz
    Małgorzata Kotulska
    [J]. Scientific Reports, 7
  • [2] N-gram Analysis of a Mongolian Text
    Altangerel, Khuder
    Tsend, Ganbat
    Jalsan, Khash-Erdene
    [J]. IFOST 2008: PROCEEDING OF THE THIRD INTERNATIONAL FORUM ON STRATEGIC TECHNOLOGIES, 2008, : 258 - 259
  • [3] N-GRAM ANALYSIS IN THE ENGINEERING DOMAIN
    Leary, Martin
    Pearson, Geoff
    Burvill, Colin
    Mazur, Maciej
    Subic, Aleksandar
    [J]. PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON ENGINEERING DESIGN (ICED 11): IMPACTING SOCIETY THROUGH ENGINEERING DESIGN, VOL 6: DESIGN INFORMATION AND KNOWLEDGE, 2011, 6 : 414 - 423
  • [4] N-gram Insight
    Prans, George
    [J]. AMERICAN SCIENTIST, 2011, 99 (05) : 356 - 357
  • [5] Applications of Boolean equations in n-gram analysis
    Marovac, Ulfeta
    [J]. ICIST '18: PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES, 2018,
  • [6] N-gram analysis for computer virus detection
    Reddy, D. Krishna Sandeep
    Pujari, Arun K.
    [J]. JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2006, 2 (03): : 231 - 239
  • [7] Efficient n-gram analysis in R with cmscu
    Vinson, David W.
    Davis, Jason K.
    Sindi, Suzanne S.
    Dale, Rick
    [J]. BEHAVIOR RESEARCH METHODS, 2016, 48 (03) : 909 - 921
  • [8] Sentiment Analysis Using N-gram Technique
    Chidananda, Himadri Tanaya
    Das, Debashis
    Sagnika, Santwana
    [J]. PROGRESS IN COMPUTING, ANALYTICS AND NETWORKING, ICCAN 2017, 2018, 710 : 359 - 367
  • [9] Efficient n-gram analysis in R with cmscu
    David W. Vinson
    Jason K. Davis
    Suzanne S. Sindi
    Rick Dale
    [J]. Behavior Research Methods, 2016, 48 : 909 - 921
  • [10] Discovering Subtype Specific n-Gram Motifs in Class C GPCR N-Termini
    Konig, Caroline
    Alquezar, Rene
    Vellido, Alfredo
    Giraldo, Jesus
    [J]. RECENT ADVANCES IN ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT, 2017, 300 : 116 - 125