EDCLoc: a prediction model for mRNA subcellular localization using improved focal loss to address multi-label class imbalance

被引:0
|
作者
Deng, Yu [1 ]
Jia, Jianhua [1 ]
Yi, Mengyue [1 ]
机构
[1] Jingdezhen Ceram Univ, Sch Informat Engn, Jingdezhen 333403, Peoples R China
来源
BMC GENOMICS | 2024年 / 25卷 / 01期
关键词
MRNA subcellular localization; Class imbalance; Multi-label; Deep learning; Focal loss; BINDING PROTEINS; CANCER; SITES; YEAST;
D O I
10.1186/s12864-024-11173-6
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
BackgroundThe subcellular localization of mRNA plays a crucial role in gene expression regulation and various cellular processes. However, existing wet lab techniques like RNA-FISH are usually time-consuming, labor-intensive, and limited to specific tissue types. Researchers have developed several computational methods to predict mRNA subcellular localization to address this. These methods face the problem of class imbalance in multi-label classification, causing models to favor majority classes and overlook minority classes during training. Additionally, traditional feature extraction methods have high computational costs, incomplete features, and may lead to the loss of critical information. On the other hand, deep learning methods face challenges related to hardware performance and training time when handling complex sequences. They may suffer from the curse of dimensionality and overfitting problems. Therefore, there is an urgent need for more efficient and accurate prediction models.ResultsTo address these issues, we propose a multi-label classifier, EDCLoc, for predicting mRNA subcellular localization. EDCLoc reduces training pressure through a stepwise pooling strategy and applies grouped convolution blocks of varying sizes at different levels, combined with residual connections, to achieve efficient feature extraction and gradient propagation. The model employs global max pooling at the end to further reduce feature dimensions and highlight key features. To tackle class imbalance, we improved the focal loss function to enhance the model's focus on minority classes. Evaluation results show that EDCLoc outperforms existing methods in most subcellular regions. Additionally, the position weight matrix extracted by multi-scale CNN filters can match known RNA-binding protein motifs, demonstrating EDCLoc's effectiveness in capturing key sequence features.ConclusionsEDCLoc outperforms existing prediction tools in most subcellular regions and effectively mitigates class imbalance issues in multi-label classification. These advantages make EDCLoc a reliable choice for multi-label mRNA subcellular localization. The dataset and source code used in this study are available at https://github.com/DellCode233/EDCLoc.
引用
收藏
页数:17
相关论文
共 48 条
  • [1] Improved multi-label classifiers for predicting protein subcellular localization
    Chen, Lei
    Qu, Ruyun
    Liu, Xintong
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2024, 21 (01) : 214 - 236
  • [2] Network Based Subcellular Localization Prediction for Multi-Label Proteins
    Mondal, Ananda Mohan
    Lin, Jhih-rong
    Hu, Jianjun
    2011 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS, 2011, : 473 - 480
  • [3] DeepLoc 2.0: multi-label subcellular localization prediction using protein language models
    Thumuluri, Vineet
    Armenteros, Jose Juan Almagro
    Johansen, Alexander Rosenberg
    Nielsen, Henrik
    Winther, Ole
    NUCLEIC ACIDS RESEARCH, 2022, 50 (W1) : W228 - W234
  • [4] Multi-label prediction of subcellular localization in confocal images using deep neural networks
    Winsnes, C. F.
    Sullivan, D. P.
    Smith, K.
    Lundberg, E.
    MOLECULAR BIOLOGY OF THE CELL, 2016, 27
  • [5] Tissue-Specific Subcellular Localization Prediction Using Multi-Label Markov Random Fields
    Zhu, Lu
    Hofestaedt, Ralf
    Ester, Martin
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (05) : 1471 - 1482
  • [6] ProStructNet: Integration of Protein Sequence and Structure for the Prediction of Multi-label Subcellular Localization
    Shi, Haopeng
    Zhang, Xiankun
    Deng, Qingxu
    ADVANCED INTELLIGENT COMPUTING IN BIOINFORMATICS, PT II, ICIC 2024, 2024, 14882 : 326 - 336
  • [7] ADAPTIVE THRESHOLDING FOR MULTI-LABEL SVM CLASSIFICATION WITH APPLICATION TO PROTEIN SUBCELLULAR LOCALIZATION PREDICTION
    Wan, Shibiao
    Mak, Man-Wai
    Kung, Sun-Yuan
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 3547 - 3551
  • [8] MRSLpred-a hybrid approach for predicting multi-label subcellular localization of mRNA at the genome scale
    Choudhury, Shubham
    Bajiya, Nisha
    Patiyal, Sumeet
    Raghava, Gajendra P. S.
    FRONTIERS IN BIOINFORMATICS, 2024, 4
  • [9] MSlocPRED: deep transfer learning-based identification of multi-label mRNA subcellular localization
    Zuo, Yun
    Zhang, Bangyi
    He, Wenying
    Bi, Yue
    Liu, Xiangrong
    Zeng, Xiangxiang
    Deng, Zhaohong
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (06)
  • [10] Deep learning model for protein multi-label subcellular localization and function prediction based on multi-task collaborative training
    Bai, Peihao
    Li, Guanghui
    Luo, Jiawei
    Liang, Cheng
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (06)