Protein Family Classification from Scratch: A CNN Based Deep Learning Approach

被引:18
|
作者
Zhang, Da [1 ]
Kabuka, Mansur R. [2 ]
机构
[1] Univ Miami, Dept Elect & Comp Engn, Coral Gables, FL 33145 USA
[2] Univ Miami, Coral Gables, FL 33146 USA
基金
美国国家卫生研究院;
关键词
Proteins; Feature extraction; Amino acids; Hidden Markov models; Deep learning; Data mining; Machine learning algorithms; Protein family classification; convolutional neural network; feature engineering; PREDICTION;
D O I
10.1109/TCBB.2020.2966633
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Next-generation sequencing techniques provide us with an opportunity for generating sequenced proteins and identifying the biological families and functions of these proteins. However, compared with identified proteins, uncharacterized proteins consist of a notable percentage of the overall proteins in the bioinformatics research field. Traditional family classification methods often devote themselves to extracting N-Gram features from sequences while ignoring motif information as well as affinity information between motifs and adjacent amino acids. Previous clustering-based algorithms have typically been used to define protein features with domain knowledge and annotate protein families based on extensive data samples. In this paper, we apply CNN based amino acid representation learning with limited characterized proteins to explore the performances of annotated protein families by taking into account the amino acid location information. Additionally, we apply the method to all reviewed protein sequences with their families retrieved from the UniProt database to evaluate our approach. Last but not least, we verify our model using those unreviewed protein records, which is typically ignored by other methods.
引用
收藏
页码:1996 / 2007
页数:12
相关论文
共 50 条
  • [1] A Deep Learning based CNN framework approach for Plankton Classification
    Rawat, Sarthak Singh
    Bisht, Abhishek
    Nijhawan, Rahul
    [J]. 2019 FIFTH INTERNATIONAL CONFERENCE ON IMAGE INFORMATION PROCESSING (ICIIP 2019), 2019, : 268 - 273
  • [2] A deep learning based ensemble approach for protein allergen classification
    Kumar, Arun
    Rana, Prashant Singh
    [J]. PeerJ Computer Science, 2023, 9
  • [3] A deep learning based ensemble approach for protein allergen classification
    Kumar, Arun
    Rana, Prashant Singh
    [J]. PEERJ COMPUTER SCIENCE, 2023, 9
  • [4] CNN-BLSTM based deep learning framework for eukaryotic kinome classification: An explainability based approach
    John, Chinju
    Sahoo, Jayakrushna
    Sajan, Irish K.
    Madhavan, Manu
    Mathew, Oommen K.
    [J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2024, 112
  • [5] Deep learning in mammography images segmentation and classification: Automated CNN approach
    Salama, Wessam M.
    Aly, Moustafa H.
    [J]. ALEXANDRIA ENGINEERING JOURNAL, 2021, 60 (05) : 4701 - 4709
  • [6] AcneNet - A Deep CNN Based Classification Approach for Acne Classes
    Junayed, Masum Shah
    Jeny, Afsana Ahsan
    Atik, Syeda Tanjila
    Neehal, Nafis
    Karim, Asif
    Azam, Sami
    Shanmugam, Bharanidharan
    [J]. PROCEEDINGS OF 2019 12TH INTERNATIONAL CONFERENCE ON INFORMATION & COMMUNICATION TECHNOLOGY AND SYSTEM (ICTS), 2019, : 203 - 208
  • [7] A Transfer Learning-Based Deep CNN Approach for Classification and Diagnosis of Acute Lymphocytic Leukemia Cells
    Magpantay, Leo Dominick C.
    Alon, Helcy D.
    Austria, Yolanda D.
    Melegrito, Mark P.
    Fernando, Glenn John O.
    [J]. 2022 INTERNATIONAL CONFERENCE ON DECISION AID SCIENCES AND APPLICATIONS (DASA), 2022, : 280 - 284
  • [8] CNN-based deep learning approach for classification of invasive ductal and metastasis types of breast carcinoma
    Islam, Md Tobibul
    Hoque, Md Enamul
    Ullah, Mohammad
    Islam, Md Toufiqul
    Nishu, Nabila Akter
    Islam, Md Rabiul
    [J]. CANCER MEDICINE, 2024, 13 (16):
  • [9] Classification of Immunity Booster Medicinal Plants Using CNN: A Deep Learning Approach
    Musa, Md
    Arman, Md Shohel
    Hossain, Md Ekram
    Thusar, Ashraful Hossen
    Nisat, Nahid Kawsar
    Islam, Arni
    [J]. ADVANCES IN COMPUTING AND DATA SCIENCES, PT I, 2021, 1440 : 244 - 254
  • [10] Deep learning approach for segmentation and classification of blood cells using enhanced CNN
    Hemalatha, B.
    Karthik, B.
    Krishna Reddy, C.V.
    Latha, A.
    [J]. Measurement: Sensors, 2022, 24