EC2Vec: A Machine Learning Method to Embed Enzyme Commission (EC) Numbers into Vector Representations

被引:0
|
作者
Liu, Mengmeng [1 ]
Ni, Xialong [2 ]
Ramanujam, J. [1 ,3 ]
Brylinski, Michal [2 ,3 ]
机构
[1] Louisiana State Univ, Div Elect & Comp Engn, Baton Rouge, LA 70803 USA
[2] Louisiana State Univ, Dept Biol Sci, Baton Rouge, LA 70803 USA
[3] Louisiana State Univ, Ctr Computat & Technol, Baton Rouge, LA 70803 USA
关键词
D O I
10.1021/acs.jcim.4c02161
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Enzyme commission (EC) numbers play a vital role in classifying enzymes and understanding their functions in enzyme-related research. Although accurate and informative encoding of EC numbers is essential for enhancing the effectiveness of machine learning applications, simple EC encoding approaches suffer from limitations such as false numerical order and high sparsity. To address these issues, we developed EC2Vec, a multimodal autoencoder that preserves the categorical nature of EC numbers and leverages their hierarchical relationships, resulting in more meaningful and informative representations. EC2Vec encodes each digit of the EC number as a categorical token and then processes these embeddings through a 1D convolutional layer to capture their relationships. Comprehensive benchmarking against a large collection of EC numbers indicates that EC2Vec outperforms simple encoding methods. The t-SNE visualization of EC2Vec embeddings revealed distinct clusters corresponding to different enzyme classes, demonstrating that the hierarchical structure of the EC numbers is effectively captured. In downstream machine learning applications, EC2Vec embeddings outperformed other EC encoding methods in the reaction-EC pair classification task, underscoring its robustness and utility for enzyme-related research and bioinformatics applications.
引用
收藏
页码:2173 / 2179
页数:7
相关论文
共 6 条
  • [1] Legal Privacy Protection Machine Learning Method Based on Word2Vec Algorithm
    Wang, Rongrong
    INTERNATIONAL JOURNAL OF INFORMATION SECURITY AND PRIVACY, 2025, 19 (01)
  • [2] Machine Learning inference using PYNQ environment in a AWS EC2 F1 Instance
    Lorusso, Marco
    Bonacorsi, Daniele
    Salomonia, Davide
    Travaglini, Riccardo
    INTERNATIONAL SYMPOSIUM ON GRIDS & CLOUDS 2022, 2022,
  • [3] Group2vec: group vector representation and its property prediction applications based on unsupervised machine learning
    Wu X.
    Liu Q.
    Cao B.
    Zhang L.
    Du J.
    Huagong Xuebao/CIESC Journal, 2023, 74 (03): : 1187 - 1194
  • [4] Detection of banned antibacterial growth promoter in animal feed by enzyme-linked immunosorbent assay: Method validation according to the Commission Decision 2002/657/EC criteria
    Squadrone, Stefania
    Marchis, Daniela
    Loria, Andrea
    Amato, Giuseppina
    Ferro, Gian Luca
    Abete, Maria Cesarina
    FOOD CONTROL, 2015, 47 : 66 - 70
  • [5] An Adaptive Machine Learning on Map-Reduce Framework for Improving Performance of Large-Scale Data Analysis on EC2
    Romsaiyud, Walisa
    Premchaiswadi, Wichian
    2013 ELEVENTH INTERNATIONAL CONFERENCE ON ICT AND KNOWLEDGE ENGINEERING (ICT&KE), 2013,
  • [6] Prediction of oil ring volume in condensate gas reservoirs with high CO2 content based on a support vector machine learning method
    Chen H.
    Jiang D.
    Xing J.
    Wang H.
    Zuo M.
    Wang C.
    Yang L.
    Liu X.
    Yu H.
    Yuan Z.
    Zhongguo Shiyou Daxue Xuebao (Ziran Kexue Ban)/Journal of China University of Petroleum (Edition of Natural Science), 2023, 47 (02): : 90 - 98