Variable Length Character N-Gram Embedding of Protein Sequences for Secondary Structure Prediction

被引:3
|
作者
Sharma, Ashish Kumar [1 ]
Srivastava, Rajeev [1 ]
机构
[1] Indian Inst Technol BHU, Dept Comp Sci & Engn, Varanasi, Uttar Pradesh, India
来源
PROTEIN AND PEPTIDE LETTERS | 2021年 / 28卷 / 05期
关键词
Proteomics; protein secondary structure; amino acids sequence; character n-gram embedding; deep learning; bidi-rectional long short-term memory; WEB SERVER; NEURAL-NETWORKS; PROFILES;
D O I
10.2174/0929866527666201103145635
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Background: The prediction of a protein's secondary structure from its amino acid sequence is an essential step towards predicting its 3-D structure. The prediction performance improves by incorporating homologous multiple sequence alignment information. Since homologous details not available for all proteins. Therefore, it is necessary to predict the protein secondary structure from single sequences. Objective and Methods: Protein secondary structure predicted from their primary sequences using n-gram word embedding and deep recurrent neural network. Protein secondary structure depends on local and long-range neighbor residues in primary sequences. In the proposed work, the local contextual information of amino acid residues captures variable-length character n-gram words. An embedding vector represents these variable-length character n-gram words. Further, the bidirectional long short-term memory (Bi-LSTM) model is used to capture the long-range contexts by extracting the past and future residues information in primary sequences. Results: The proposed model evaluates on three public datasets ss.txt, RS126, and CASP9. The model shows the Q3 accuracy of 92.57%, 86.48%, and 89.66% for ss.txt, RS126, and CASP9. Conclusion: The proposed model performance compares with state-of-the-art methods available in the literature. After a comparative analysis, it observed that the proposed model performs better than state-of-the-art methods.
引用
收藏
页码:501 / 507
页数:7
相关论文
共 50 条
  • [1] The relationship between N-gram patterns and protein secondary structure
    Vries, John K.
    Liu, Xiong
    Bahar, Ivet
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2007, 68 (04) : 830 - 838
  • [2] Protein Secondary Structure Prediction Using Character bi-gram Embedding and Bi-LSTM
    Sharma, Ashish Kumar
    Srivastava, Rajeev
    [J]. CURRENT BIOINFORMATICS, 2021, 16 (02) : 333 - 338
  • [3] n-Gram Analysis of COG Categorized Protein Sequences
    Marovac, Ulfeta A.
    Mitic, Nenad S.
    [J]. MATCH-COMMUNICATIONS IN MATHEMATICAL AND IN COMPUTER CHEMISTRY, 2015, 74 (03) : 575 - 590
  • [4] Variable-length category n-gram language models
    Niesler, TR
    Woodland, PC
    [J]. COMPUTER SPEECH AND LANGUAGE, 1999, 13 (01): : 99 - 124
  • [5] EFFICIENT DEEP FEATURES LEARNING FOR VULNERABILITY DETECTION USING CHARACTER N-GRAM EMBEDDING
    Alenezi, Mamdouh
    Zagane, Mohammed
    Javed, Yasir
    [J]. JORDANIAN JOURNAL OF COMPUTERS AND INFORMATION TECHNOLOGY, 2021, 7 (01): : 25 - 38
  • [6] Application of variable length N-gram vectors to monolingual and bilingual information retrieval
    Gayo-Avello, D
    Alvarez-Gutiérrez, D
    Gayo-Avello, J
    [J]. MULTILINGUAL INFORMATION ACCESS FOR TEXT, SPEECH AND IMAGES, 2005, 3491 : 73 - 82
  • [7] Character n-Gram Spotting in Document Images
    Praveen, Sudha M.
    Sankar, Pramod K.
    Jawahar, C. V.
    [J]. 11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 941 - 945
  • [8] A variable-length category-based n-gram language model
    Niesler, TR
    Woodland, PC
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 164 - 167
  • [9] Transmembrane Protein Prediction Using N-Gram and Random Forests
    Li, Jinjin
    Xu, Lei
    Yang, Chenhui
    Jiang, Yi
    [J]. JOURNAL OF COMPUTATIONAL AND THEORETICAL NANOSCIENCE, 2014, 11 (12) : 2526 - 2534
  • [10] A Compromise between N-gram Length and Classifier Characteristics for Protein Classification
    Mhamdi, Faouzi
    Rakotomalala, Ricco
    Elloumi, Mourad
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2006, 6 (04): : 82 - 87