EDCLoc: a prediction model for mRNA subcellular localization using improved focal loss to address multi-label class imbalance

被引:0
|
作者
Deng, Yu [1 ]
Jia, Jianhua [1 ]
Yi, Mengyue [1 ]
机构
[1] Jingdezhen Ceram Univ, Sch Informat Engn, Jingdezhen 333403, Peoples R China
来源
BMC GENOMICS | 2024年 / 25卷 / 01期
关键词
MRNA subcellular localization; Class imbalance; Multi-label; Deep learning; Focal loss; BINDING PROTEINS; CANCER; SITES; YEAST;
D O I
10.1186/s12864-024-11173-6
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
BackgroundThe subcellular localization of mRNA plays a crucial role in gene expression regulation and various cellular processes. However, existing wet lab techniques like RNA-FISH are usually time-consuming, labor-intensive, and limited to specific tissue types. Researchers have developed several computational methods to predict mRNA subcellular localization to address this. These methods face the problem of class imbalance in multi-label classification, causing models to favor majority classes and overlook minority classes during training. Additionally, traditional feature extraction methods have high computational costs, incomplete features, and may lead to the loss of critical information. On the other hand, deep learning methods face challenges related to hardware performance and training time when handling complex sequences. They may suffer from the curse of dimensionality and overfitting problems. Therefore, there is an urgent need for more efficient and accurate prediction models.ResultsTo address these issues, we propose a multi-label classifier, EDCLoc, for predicting mRNA subcellular localization. EDCLoc reduces training pressure through a stepwise pooling strategy and applies grouped convolution blocks of varying sizes at different levels, combined with residual connections, to achieve efficient feature extraction and gradient propagation. The model employs global max pooling at the end to further reduce feature dimensions and highlight key features. To tackle class imbalance, we improved the focal loss function to enhance the model's focus on minority classes. Evaluation results show that EDCLoc outperforms existing methods in most subcellular regions. Additionally, the position weight matrix extracted by multi-scale CNN filters can match known RNA-binding protein motifs, demonstrating EDCLoc's effectiveness in capturing key sequence features.ConclusionsEDCLoc outperforms existing prediction tools in most subcellular regions and effectively mitigates class imbalance issues in multi-label classification. These advantages make EDCLoc a reliable choice for multi-label mRNA subcellular localization. The dataset and source code used in this study are available at https://github.com/DellCode233/EDCLoc.
引用
收藏
页数:17
相关论文
共 48 条
  • [21] CFPLncLoc: A multi-label lncRNA subcellular localization prediction based on Chaos game representation and centralized feature pyramid
    Wang, Sheng
    Yu, Zu-Guo
    Han, Guo-Sheng
    Sun, Xin-Gen
    INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES, 2025, 297
  • [22] Imbalanced Multi-Modal Multi-Label Learning for Subcellular Localization Prediction of Human Proteins with Both Single and Multiple Sites
    He, Jianjun
    Gu, Hong
    Liu, Wenqi
    PLOS ONE, 2012, 7 (06):
  • [23] Predicting the Subcellular Localization of Multi-site Protein Based on Fusion Feature and Multi-label Deep Forest Model
    Yang, Hongri
    Meng, Qingfang
    Chen, Yuehui
    Zhong, Lianxin
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2022, PT II, 2022, 13394 : 334 - 344
  • [24] Multi-label Classification with Partial Annotations using Class-aware Selective Loss
    Ben-Baruch, Emanuel
    Ridnik, Tal
    Friedman, Itamar
    Ben-Cohen, Avi
    Zamir, Nadav
    Noy, Asaf
    Zelnik-Manor, Lihi
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4754 - 4762
  • [25] Text prediction method based on multi-label attributes and improved maximum entropy model
    Yin, Yi
    Feng, Dan
    Li, Yue
    Yin, Shuifang
    Shi, Zhan
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 34 (02) : 1097 - 1109
  • [26] ncRNALocate-EL: a multi-label ncRNA subcellular locality prediction model based on ensemble learning
    Bai, Tao
    Liu, Bin
    BRIEFINGS IN FUNCTIONAL GENOMICS, 2023, 22 (05) : 442 - 452
  • [27] GP-HTNLoc: A graph prototype head-tail network-based model for multi-label subcellular localization prediction of ncRNAs
    Han, Shuangkai
    Liu, Lin
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2024, 23 : 2034 - 2048
  • [28] Euk-mPSL: A Deep Learning Framework for Multi-Label Eukaryotic Protein Subcellular Localization Prediction with Imbalanced Datasets
    Yan, Ziming
    Liu, Fu
    Liu, Yun
    PROCEEDINGS OF 2023 4TH INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE FOR MEDICINE SCIENCE, ISAIMS 2023, 2023, : 1141 - 1145
  • [29] Deep learning-based multi-functional therapeutic peptides prediction with a multi-label focal dice loss function
    Fan, Henghui
    Yan, Wenhui
    Wang, Lihua
    Liu, Jie
    Bin, Yannan
    Xia, Junfeng
    BIOINFORMATICS, 2023, 39 (06)
  • [30] mPLR-Loc: An adaptive decision multi-label classifier based on penalized logistic regression for protein subcellular localization prediction
    Wan, Shibiao
    Mak, Man-Wai
    Kung, Sun-Yuan
    ANALYTICAL BIOCHEMISTRY, 2015, 473 : 14 - 27