Transmembrane Protein Prediction Using N-Gram and Random Forests

被引:2
|
作者
Li, Jinjin [1 ]
Xu, Lei [2 ]
Yang, Chenhui [1 ]
Jiang, Yi [1 ]
机构
[1] Xiamen Univ, Sch Informat Sci & Technol, Xiamen 361005, Fujian, Peoples R China
[2] Shenzhen Inst Informat Technol, Software Sch, Shenzhen 518029, Guangdong, Peoples R China
关键词
Proteomics; Transmembrane Protein; Machine Learning; Random Forests; K-Nearest Neighbor; SVM; N-Gram; SUPPORT VECTOR MACHINES; WEB SERVER; SELECTION;
D O I
10.1166/jctn.2014.3670
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
With recent development of proteomics, the importance of transmembrane proteins has been widely acknowledged. Previous bioinformatics studies have mainly focused on the classification of membrane proteins and ignored the important role of transmembrane proteins. In this study, we integrated the preceding order of amino acids into a machine learning approach based on a revamped N-gram model to predict transmembrane proteins using only protein sequence information. The framework consists of two steps: The N-gram model revamped for processing protein sequences was used as a feature extraction algorithm; then, we compared the performance of the popular classifiers logistic regression, Random Forests, support vector machine, and K-nearest neighbor using the N-gram model. N-gram combined with the Random Forests classifier obtained the highest accuracy at 95.6%, which is higher than other methods. The finding can help future studies on the structure and function of transmembrane proteins, drug design, and the classification of membrane proteins. In addition, a publicly accessible web server and software was established.
引用
收藏
页码:2526 / 2534
页数:9
相关论文
共 50 条
  • [1] N-gram approach for gender prediction
    Reddy, T. Raghunadha
    Vardhan, B. Vishnu
    Reddy, P. Vijayapal
    [J]. 2017 7TH IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2017, : 860 - 865
  • [2] Pathway Prediction Using Similar Users and the N-gram Model
    Kawase, Kanta
    Thawonmas, Ruck
    [J]. 2013 INTERNATIONAL JOINT CONFERENCE ON AWARENESS SCIENCE AND TECHNOLOGY & UBI-MEDIA COMPUTING (ICAST-UMEDIA), 2013, : 131 - 136
  • [3] Protein Classification Using N-gram Technique and Association Rules
    Kabli, Fatima
    Hamou, Reda Mohamed
    Amine, Abdelmalek
    [J]. INTERNATIONAL JOURNAL OF SOFTWARE INNOVATION, 2018, 6 (02) : 77 - 89
  • [4] Evaluation of action prediction method using inductive learning with N-gram
    Xu, JA
    Itoh, T
    Araki, K
    Tochinai, K
    [J]. 2004 7TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS 1-3, 2004, : 1605 - 1609
  • [5] Classification of facemarks using N-gram
    Yamada, Thichi
    Tsuchiya, Seiji
    Kuroiwa, Shiongo
    Ren, Fuji
    [J]. PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (NLP-KE'07), 2007, : 322 - +
  • [6] WhatNext: A prediction system for web requests using n-gram sequence models
    Su, Z
    Yang, Q
    Lu, Y
    Zhang, HJ
    [J]. PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS ENGINEERING, VOL I, 2000, : 214 - 221
  • [7] Protein Classification using Modified N-gram and Skip-gram Models Extended Abstract
    Islam, S. M. Ashiqul
    Kearney, Christopher Michel
    Choudhury, Ankan
    Baker, Erich J.
    [J]. ACM-BCB' 2017: PROCEEDINGS OF THE 8TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY,AND HEALTH INFORMATICS, 2017, : 586 - 586
  • [8] Classification and Prediction of Antimicrobial Peptides Using N-gram Representation and Machine Learning
    Othman, Manal
    Ratna, Sujay
    Tewari, Anant
    Kang, Anthony M.
    Du, Katherine
    Vaisman, Iosif I.
    [J]. ACM-BCB' 2017: PROCEEDINGS OF THE 8TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY,AND HEALTH INFORMATICS, 2017, : 605 - 605
  • [9] Variable Length Character N-Gram Embedding of Protein Sequences for Secondary Structure Prediction
    Sharma, Ashish Kumar
    Srivastava, Rajeev
    [J]. PROTEIN AND PEPTIDE LETTERS, 2021, 28 (05): : 501 - 507
  • [10] N-gram Insight
    Prans, George
    [J]. AMERICAN SCIENTIST, 2011, 99 (05) : 356 - 357