EDCLoc: a prediction model for mRNA subcellular localization using improved focal loss to address multi-label class imbalance

被引:0
|
作者
Deng, Yu [1 ]
Jia, Jianhua [1 ]
Yi, Mengyue [1 ]
机构
[1] Jingdezhen Ceram Univ, Sch Informat Engn, Jingdezhen 333403, Peoples R China
来源
BMC GENOMICS | 2024年 / 25卷 / 01期
关键词
MRNA subcellular localization; Class imbalance; Multi-label; Deep learning; Focal loss; BINDING PROTEINS; CANCER; SITES; YEAST;
D O I
10.1186/s12864-024-11173-6
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
BackgroundThe subcellular localization of mRNA plays a crucial role in gene expression regulation and various cellular processes. However, existing wet lab techniques like RNA-FISH are usually time-consuming, labor-intensive, and limited to specific tissue types. Researchers have developed several computational methods to predict mRNA subcellular localization to address this. These methods face the problem of class imbalance in multi-label classification, causing models to favor majority classes and overlook minority classes during training. Additionally, traditional feature extraction methods have high computational costs, incomplete features, and may lead to the loss of critical information. On the other hand, deep learning methods face challenges related to hardware performance and training time when handling complex sequences. They may suffer from the curse of dimensionality and overfitting problems. Therefore, there is an urgent need for more efficient and accurate prediction models.ResultsTo address these issues, we propose a multi-label classifier, EDCLoc, for predicting mRNA subcellular localization. EDCLoc reduces training pressure through a stepwise pooling strategy and applies grouped convolution blocks of varying sizes at different levels, combined with residual connections, to achieve efficient feature extraction and gradient propagation. The model employs global max pooling at the end to further reduce feature dimensions and highlight key features. To tackle class imbalance, we improved the focal loss function to enhance the model's focus on minority classes. Evaluation results show that EDCLoc outperforms existing methods in most subcellular regions. Additionally, the position weight matrix extracted by multi-scale CNN filters can match known RNA-binding protein motifs, demonstrating EDCLoc's effectiveness in capturing key sequence features.ConclusionsEDCLoc outperforms existing prediction tools in most subcellular regions and effectively mitigates class imbalance issues in multi-label classification. These advantages make EDCLoc a reliable choice for multi-label mRNA subcellular localization. The dataset and source code used in this study are available at https://github.com/DellCode233/EDCLoc.
引用
收藏
页数:17
相关论文
共 48 条
  • [41] MLPPF: Multi-Label Prediction of piRNA Functions Based on Pretrained k-mer, Positional Embedding and an Improved TextRNN Model
    Liu, Yajun
    Li, Ru
    Lu, Yang
    Li, Aimin
    Wang, Zhirui
    Li, Wei
    ELECTRONICS, 2024, 13 (01)
  • [42] An Effective Multi-Label Protein Sub-Chloroplast Localization Prediction by Skipped-Grams of Evolutionary Profiles Using Deep Neural Network
    Bankapur, Sanjay
    Patil, Nagamma
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (03) : 1449 - 1458
  • [43] Classification of malignant and benign lung nodule and prediction of image label class using multi-deep model
    Zia, Muahammad Bilal
    Juan, Zhao Juan
    Xiao, Ning
    Wang, Jiawen
    Khan, Ammad
    Zhou, Xujuan
    Juan, Zhao Juan, 1600, Science and Information Organization (11): : 35 - 41
  • [44] Classification of Malignant and Benign Lung Nodule and Prediction of Image Label Class using Multi-Deep Model
    Zia, Muahammad Bilal
    Juan, Zhao Juan
    Zhou, Xujuan
    Xiao, Ning
    Wang, Jiawen
    Khan, Ammad
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (03) : 35 - 41
  • [45] MIC_Locator: a novel image-based protein subcellular location multi-label prediction model based on multi-scale monogenic signal representation and intensity encoding strategy
    Yang, Fan
    Liu, Yang
    Wang, Yanbin
    Yin, Zhijian
    Yang, Zhen
    BMC BIOINFORMATICS, 2019, 20 (01)
  • [46] MIC_Locator: a novel image-based protein subcellular location multi-label prediction model based on multi-scale monogenic signal representation and intensity encoding strategy
    Fan Yang
    Yang Liu
    Yanbin Wang
    Zhijian Yin
    Zhen Yang
    BMC Bioinformatics, 20
  • [47] A multi-stage sub-structural damage localization approach using multi-label radial basis function neural network and auto-regressive model parameters
    Mazloom, Shayan
    Sa'adati, Nima
    Rabbani, Amirmohammad
    Bitaraf, Maryam
    ADVANCES IN STRUCTURAL ENGINEERING, 2024, 27 (12) : 2133 - 2152
  • [48] Short-term wind speed interval prediction using improved quality-driven loss based gated multi-scale convolutional sequence model
    Saeed, Adnan
    Li, Chaoshun
    Gan, Zhenhao
    ENERGY, 2024, 300