ProtPlat: an efficient pre-training platform for protein classification based on FastText

被引:4
|
作者
Jin, Yuan
Yang, Yang [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
Protein sequence classification; ProtPlat; Pre-training; Web server; SUBCELLULAR-LOCALIZATION; PREDICTION;
D O I
10.1186/s12859-022-04604-2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: For the past decades, benefitting from the rapid growth of protein sequence data in public databases, a lot of machine learning methods have been developed to predict physicochemical properties or functions of proteins using amino acid sequence features. However, the prediction performance often suffers from the lack of labeled data. In recent years, pre-training methods have been widely studied to address the small-sample issue in computer vision and natural language processing fields, while specific pre-training techniques for protein sequences are few. Results: In this paper, we propose a pre-training platform for representing protein sequences, called ProtPlat, which uses the Pfam database to train a three-layer neural network, and then uses specific training data from downstream tasks to fine-tune the model. ProtPlat can learn good representations for amino acids, and at the same time achieve efficient classification. We conduct experiments on three protein classification tasks, including the identification of type III secreted effectors, the prediction of subcellular localization, and the recognition of signal peptides. The experimental results show that the pre-training can enhance model performance effectively and ProtPlat is competitive to the state-of-the-art predictors, especially for small datasets. We implement the ProtPlat platform as a web service (https://compbio.sjtu.edu.cn/protplat) that is accessible to the public. Conclusions: To enhance the feature representation of protein amino acid sequences and improve the performance of sequence-based classification tasks, we develop ProtPlat, a general platform for the pre-training of protein sequences, which is featured by a large-scale supervised training based on Pfam database and an efficient learning model, FastText. The experimental results of three downstream classification tasks demonstrate the efficacy of ProtPlat.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] ProtPlat: an efficient pre-training platform for protein classification based on FastText
    Yuan Jin
    Yang Yang
    BMC Bioinformatics, 23
  • [2] Efficient Conditional Pre-training for Transfer Learning
    Chakraborty, Shuvam
    Uzkent, Burak
    Ayush, Kumar
    Tanmay, Kumar
    Sheehan, Evan
    Ermon, Stefano
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4240 - 4249
  • [3] A Pre-training Approach for Stance Classification in Online Forums
    Tshimula, Jean Marie
    Chikhaoui, Belkacem
    Wang, Shengrui
    2020 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2020, : 280 - 287
  • [4] Pre-training with Augmentations for Efficient Transfer in Model-Based Reinforcement Learning
    Esteves, Bernardo
    Vasco, Miguel
    Melo, Francisco S.
    PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2023, PT I, 2023, 14115 : 133 - 145
  • [5] Length-Based Curriculum Learning for Efficient Pre-training of Language Models
    Nagatsuka, Koichi
    Broni-Bediako, Clifford
    Atsumi, Masayasu
    NEW GENERATION COMPUTING, 2023, 41 (01) : 109 - 134
  • [6] Length-Based Curriculum Learning for Efficient Pre-training of Language Models
    Koichi Nagatsuka
    Clifford Broni-Bediako
    Masayasu Atsumi
    New Generation Computing, 2023, 41 : 109 - 134
  • [7] ELLE: Efficient Lifelong Pre-training for Emerging Data
    Qin, Yujia
    Zhang, Jiajie
    Lin, Yankai
    Liu, Zhiyuan
    Li, Peng
    Sun, Maosong
    Zhou, Jie
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 2789 - 2810
  • [8] SEPT: Towards Scalable and Efficient Visual Pre-training
    Lin, Yiqi
    Zheng, Huabin
    Zhong, Huaping
    Zhu, Jinjing
    Li, Weijia
    He, Conghui
    Wang, Lin
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 1622 - 1630
  • [9] Selection of an artificial pre-training neural network for the classification of inland vessels based on their images
    Bobkowska, Katarzyna
    Bodus-Olkowska, Izabela
    SCIENTIFIC JOURNALS OF THE MARITIME UNIVERSITY OF SZCZECIN-ZESZYTY NAUKOWE AKADEMII MORSKIEJ W SZCZECINIE, 2021, 67 (139):
  • [10] Self-Supervised Pre-training for Time Series Classification
    Shi, Pengxiang
    Ye, Wenwen
    Qin, Zheng
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,