Protein Family Classification from Scratch: A CNN Based Deep Learning Approach

被引:18
|
作者
Zhang, Da [1 ]
Kabuka, Mansur R. [2 ]
机构
[1] Univ Miami, Dept Elect & Comp Engn, Coral Gables, FL 33145 USA
[2] Univ Miami, Coral Gables, FL 33146 USA
基金
美国国家卫生研究院;
关键词
Proteins; Feature extraction; Amino acids; Hidden Markov models; Deep learning; Data mining; Machine learning algorithms; Protein family classification; convolutional neural network; feature engineering; PREDICTION;
D O I
10.1109/TCBB.2020.2966633
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Next-generation sequencing techniques provide us with an opportunity for generating sequenced proteins and identifying the biological families and functions of these proteins. However, compared with identified proteins, uncharacterized proteins consist of a notable percentage of the overall proteins in the bioinformatics research field. Traditional family classification methods often devote themselves to extracting N-Gram features from sequences while ignoring motif information as well as affinity information between motifs and adjacent amino acids. Previous clustering-based algorithms have typically been used to define protein features with domain knowledge and annotate protein families based on extensive data samples. In this paper, we apply CNN based amino acid representation learning with limited characterized proteins to explore the performances of annotated protein families by taking into account the amino acid location information. Additionally, we apply the method to all reviewed protein sequences with their families retrieved from the UniProt database to evaluate our approach. Last but not least, we verify our model using those unreviewed protein records, which is typically ignored by other methods.
引用
收藏
页码:1996 / 2007
页数:12
相关论文
共 50 条
  • [21] Learning Transferable 3D-CNN for MRI-Based Brain Disorder Classification from Scratch: An Empirical Study
    Guan, Hao
    Wang, Li
    Yao, Dongren
    Bozoki, Andrea
    Liu, Mingxia
    [J]. MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2021, 2021, 12966 : 10 - 19
  • [22] ConvXGB: A new deep learning model for classification problems based on CNN and XGBoost
    Thongsuwan, Setthanun
    Jaiyen, Saichon
    Padcharoen, Anantachai
    Agarwal, Praveen
    [J]. NUCLEAR ENGINEERING AND TECHNOLOGY, 2021, 53 (02) : 522 - 531
  • [23] Comparison of CNN-based deep learning architectures for rice diseases classification
    Ahad, Md Taimur
    Li, Yan
    Song, Bo
    Bhuiyan, Touhid
    [J]. ARTIFICIAL INTELLIGENCE IN AGRICULTURE, 2023, 9 : 22 - 35
  • [24] CNN-based hybrid deep learning framework for human activity classification
    Ahmad, Naeem
    Ghosh, Sunit
    Rout, Jitendra Kumar
    [J]. INTERNATIONAL JOURNAL OF SENSOR NETWORKS, 2024, 44 (02) : 74 - 83
  • [25] CNN Based Deep Learning Approach for Automatic Malaria Parasite Detection
    Turuk, Mousami
    Sreemathy, R.
    Kadiyala, Sadhvika
    Kotecha, Sakshi
    Kulkarni, Vaishnavi
    [J]. IAENG International Journal of Computer Science, 2022, 49 (03)
  • [26] Hybrid Deep Learning Approach Based on LSTM and CNN for Malware Detection
    Thakur, Preeti
    Kansal, Vineet
    Rishiwal, Vinay
    [J]. WIRELESS PERSONAL COMMUNICATIONS, 2024, 136 (03) : 1879 - 1901
  • [27] Hybrid Approach for Taxonomic Classification Based on Deep Learning
    Soliman, Naglaa F.
    Abd-Alhalem, Samia M.
    El-Shafai, Walid
    Abdulrahman, Salah Eldin S. E.
    Ismaiel, N.
    El-Rabaie, El-Sayed M.
    Algarni, Abeer D.
    Algarni, Fatimah
    Alhussan, Amel A.
    Abd El-Samie, Fathi E.
    [J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 32 (03): : 1881 - 1891
  • [28] A Deep Learning-based Approach for WBC Classification
    Ramyashree, K. S.
    Sharada, B.
    Bhairava, R.
    [J]. 2024 5TH INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN INFORMATION TECHNOLOGY, ICITIIT 2024, 2024,
  • [29] Leukemia classification using the deep learning method of CNN
    Arivuselvam, B.
    Sudha, S.
    [J]. JOURNAL OF X-RAY SCIENCE AND TECHNOLOGY, 2022, 30 (03) : 567 - 585
  • [30] Speed Estimation from Vibrations Using a Deep Learning CNN Approach
    Karlsson, Rickard
    Hendeby, Gustaf
    [J]. IEEE Sensors Letters, 2021, 5 (03):