ProLanGO2: Protein Function Prediction with Ensemble of Encoder-Decoder Networks

被引:2
|
作者
Hippe, Kyle [1 ]
Gbenro, Sola [1 ]
Cao, Renzhi [1 ]
机构
[1] Pacific Lutheran Univ, Dept Comp Sci, Tacoma, WA 98447 USA
关键词
Protein function prediction; Recurrent Neural Network; Machine learning; AUTOMATED PREDICTION; ANNOTATIONS; SEQUENCES; DATABASE;
D O I
10.1145/3388440.3414701
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Predicting protein function from protein sequence is a main challenge in the computational biology field. Traditional methods that search protein sequences against existing databases may not work well in practice, particularly when little or no homology exists in the database. We introduce the ProLanGO2 method which utilizes the natural language processing and machine learning techniques to tackle the protein function prediction problem with protein sequence as input. Our method has been benchmarked blindly in the latest Critical Assessment of protein Function Annotation algorithms (CAFA 4) experiment. There are a few changes compared to the old version of ProLanGO. First of all, the latest version of the UniProt database is used. Second, the Uniprot database is filtered by the newly created fragment sequence database FSD to prepare for the protein sequence language. Third, the Encoder-Decoder network, a model consisting of two RNNs (encoder and decoder), is used to train models on the dataset. Fourth, if no k-mers of a protein sequence exist in the FSD, we select the top ten GO terms with the highest probability in all sequences from the Uniprot database that didn't contain any k-mers in FSD, and use those ten GO terms as back up for the prediction of new protein sequence. Finally, we selected the 100 best performing models and explored all combinations of those models to select the best performance ensemble model. We benchmark those different combinations of models on CAFA 3 dataset and select three top performance ensemble models for prediction in the latest CAFA 4 experiment as CaoLab. We have also evaluated the performance of our ProLanGO2 method on 253 unseen sequences taken from the UniProt database and compared with several other protein function prediction methods, the results show that our method achieves great performance among sequence-based protein function prediction methods. Our method is available in GitHub: https://github.com/caorenzhi/ProLanGO2.git.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] A Convolutional Encoder-Decoder Network With Skip Connections for Saliency Prediction
    Qi, Fei
    Lin, Chunhuan
    Shi, Guangming
    Li, Hao
    [J]. IEEE ACCESS, 2019, 7 : 60428 - 60438
  • [32] Are all shortcuts in encoder-decoder networks beneficial for CT denoising?
    Chen, Junhua
    Zhang, Chong
    Wee, Leonard
    Dekker, Andre
    Bermejo, Inigo
    [J]. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING-IMAGING AND VISUALIZATION, 2023, 11 (01): : 59 - 66
  • [33] Crowd Counting and Density Estimation by Trellis Encoder-Decoder Networks
    Jiang, Xiaolong
    Xiao, Zehao
    Zhang, Baochang
    Zhen, Xiantong
    Cao, Xianbin
    Doermann, David
    Shao, Ling
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6126 - 6135
  • [34] Graph Regularized Encoder-Decoder Networks for Image Representation Learning
    Yang, Shijie
    Li, Liang
    Wang, Shuhui
    Zhang, Weigang
    Huang, Qingming
    Tian, Qi
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 3124 - 3136
  • [35] Retrieval Augmented Convolutional Encoder-decoder Networks for Video Captioning
    Chen, Jingwen
    Pan, Yingwei
    Li, Yehao
    Yao, Ting
    Chao, Hongyang
    Mei, Tao
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (01)
  • [36] Attention-based encoder-decoder networks for workflow recognition
    Zhang, Min
    Hu, Haiyang
    Li, Zhongjin
    Chen, Jie
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (28-29) : 34973 - 34995
  • [37] Video Summarization With Attention-Based Encoder-Decoder Networks
    Ji, Zhong
    Xiong, Kailin
    Pang, Yanwei
    Li, Xuelong
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (06) : 1709 - 1717
  • [38] Semantic Translation with Convolutional Encoder-decoder Networks for Viewpoint Estimation
    Zhang, Liangjun
    Gu, Changjian
    Gu, Chaochen
    Wu, Kaijie
    Guan, Xinping
    [J]. 2017 11TH ASIAN CONTROL CONFERENCE (ASCC), 2017, : 1660 - 1665
  • [39] PottsMGNet: A Mathematical Explanation of Encoder-Decoder Based Neural Networks
    Tai, Xue-Cheng
    Liu, Hao
    Chan, Raymond
    [J]. SIAM JOURNAL ON IMAGING SCIENCES, 2024, 17 (01): : 540 - 594
  • [40] Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning
    Chen, Jingwen
    Pan, Yingwei
    Li, Yehao
    Yao, Ting
    Chao, Hongyang
    Mei, Tao
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8167 - 8174