ProPythia: A Python']Python package for protein classification based on machine and deep learning

被引:0
|
作者
Sequeira, Ana Marta [1 ]
Lousa, Diana [2 ]
Rocha, Miguel [1 ]
机构
[1] Univ Minho, CEB Ctr Biol Engn, P-4710057 Braga, Portugal
[2] Univ Nova Lisboa, Prot Modelling Lab, Inst Tecnol Quim & Biol Antonio Xavier ITQB NOVA, P-2780157 Oeiras, Portugal
关键词
Machine learning; Deep learning; Protein; peptide classification; !text type='Python']Python[!/text] Package; Antimicrobial peptide; Enzyme; WEB SERVER; DESCRIPTORS; PREDICTION; TOOL;
D O I
10.1016/j.neucom.2021.07.102
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The field of protein data mining has been growing rapidly in the last years. To characterize proteins and determine their function from their amino acid sequences are challenging and long-standing problems, where Bioinformatics and Machine Learning have an emergent role. A myriad of machine and deep learning algorithms have been applied in these tasks with exciting results. However, tools and platforms to calculate protein features and perform both Machine Learning (ML) and Deep Learning (DL) pipelines, taking as inputs protein sequences, are still lacking and have their limitations in terms of performance, user-friendliness and restricted domains of application. Here, to address these limitations, we propose ProPythia, a generic and modular Python package that allows to easily deploy ML and DL approaches for a plethora of problems in protein sequence analysis and classification. It facilitates the implementation, comparison and validation of the major tasks in ML or DL pipelines including modules to read and alter sequences, calculate protein features, preprocess datasets, execute feature selection and dimensionality reduction, perform clustering and manifold analysis, as well as to train and optimize ML/DL models and use them to make predictions. ProPythia has an adaptable modular architecture being a versatile and easy-to-use tool, which will be useful to transform protein data in valuable knowledge even for people not familiarized with ML code. This platform was tested in several applications comparing with results from literature. Here, we illustrate its applicability in two cases studies: the prediction of antimicrobial peptides and the prediction of enzymes Enzyme commission (EC) numbers. Furthermore, we assess the performance of the different descriptors on four different protein classification challenges. Its source code and documentation, including an user guide and case studies are freely available at https://github.com/BioSystemsUM/propythia.(c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:172 / 182
页数:11
相关论文
共 50 条
  • [1] ProPythia: A Python package for protein classification based on machine and deep learning
    Sequeira, Ana Marta
    Lousa, Diana
    Rocha, Miguel
    [J]. Neurocomputing, 2022, 484 : 172 - 182
  • [2] Geomstats: A Python']Python Package for Riemannian Geometry in Machine Learning
    Miolane, Nina
    Guigui, Nicolas
    Le Brigant, Alice
    Mathe, Johan
    Hou, Benjamin
    Thanwerdas, Yann
    Heyder, Stefan
    Peltre, Olivier
    Koep, Niklas
    Zaatiti, Hadi
    Hajri, Hatem
    Cabanes, Yann
    Gerald, Thomas
    Chauchat, Paul
    Shewmake, Christian
    Brooks, Daniel
    Kainz, Bernhard
    Donnat, Claire
    Holmes, Susan
    Pennec, Xavier
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21
  • [3] Glycowork: A Python']Python package for glycan data science and machine learning
    Thomes, Luc
    Burkholz, Rebekka
    Bojar, Daniel
    [J]. GLYCOBIOLOGY, 2021, 31 (10) : 1240 - 1244
  • [4] Causal ML: Python']Python package for causal inference machine learning
    Zhao, Yang
    Liu, Qing
    [J]. SOFTWAREX, 2023, 21
  • [5] Machine learning, deep learning and Python']Python language in field of geology
    Zhou YongZhang
    Wang Jun
    Zuo RenGuang
    Xiao Fan
    Shen WenJie
    Wang ShuGong
    [J]. ACTA PETROLOGICA SINICA, 2018, 34 (11) : 3173 - 3178
  • [6] PyGenePlexus: a Python']Python package for gene discovery using network-based machine learning
    Mancuso, Christopher A.
    Liu, Renming
    Krishnan, Arjun
    [J]. BIOINFORMATICS, 2023, 39 (02)
  • [7] Churn Analysis with Machine Learning Classification Algorithms in Python']Python
    Ozdemir, Onur
    Batar, Mustafa
    Isik, Ali Hakan
    [J]. ARTIFICIAL INTELLIGENCE AND APPLIED MATHEMATICS IN ENGINEERING PROBLEMS, 2020, 43 : 844 - 852
  • [8] DeepForest: A Python']Python package for RGB deep learning tree crown delineation
    Weinstein, Ben G.
    Marconi, Sergio
    Aubry-Kientz, Melaine
    Vincent, Gregoire
    Senyondo, Henry
    White, Ethan P.
    [J]. METHODS IN ECOLOGY AND EVOLUTION, 2020, 11 (12): : 1743 - 1751
  • [9] tension: A Python']Python package for FORCE learning
    Liu, Lu Bin
    Losonczy, Attila
    Liao, Zhenrui
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (12)
  • [10] pyts: A Python']Python Package for Time Series Classification
    Faouzi, Johann
    Janati, Hicham
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21