EDCLoc: a prediction model for mRNA subcellular localization using improved focal loss to address multi-label class imbalance

被引:0
|
作者
Deng, Yu [1 ]
Jia, Jianhua [1 ]
Yi, Mengyue [1 ]
机构
[1] Jingdezhen Ceram Univ, Sch Informat Engn, Jingdezhen 333403, Peoples R China
来源
BMC GENOMICS | 2024年 / 25卷 / 01期
关键词
MRNA subcellular localization; Class imbalance; Multi-label; Deep learning; Focal loss; BINDING PROTEINS; CANCER; SITES; YEAST;
D O I
10.1186/s12864-024-11173-6
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
BackgroundThe subcellular localization of mRNA plays a crucial role in gene expression regulation and various cellular processes. However, existing wet lab techniques like RNA-FISH are usually time-consuming, labor-intensive, and limited to specific tissue types. Researchers have developed several computational methods to predict mRNA subcellular localization to address this. These methods face the problem of class imbalance in multi-label classification, causing models to favor majority classes and overlook minority classes during training. Additionally, traditional feature extraction methods have high computational costs, incomplete features, and may lead to the loss of critical information. On the other hand, deep learning methods face challenges related to hardware performance and training time when handling complex sequences. They may suffer from the curse of dimensionality and overfitting problems. Therefore, there is an urgent need for more efficient and accurate prediction models.ResultsTo address these issues, we propose a multi-label classifier, EDCLoc, for predicting mRNA subcellular localization. EDCLoc reduces training pressure through a stepwise pooling strategy and applies grouped convolution blocks of varying sizes at different levels, combined with residual connections, to achieve efficient feature extraction and gradient propagation. The model employs global max pooling at the end to further reduce feature dimensions and highlight key features. To tackle class imbalance, we improved the focal loss function to enhance the model's focus on minority classes. Evaluation results show that EDCLoc outperforms existing methods in most subcellular regions. Additionally, the position weight matrix extracted by multi-scale CNN filters can match known RNA-binding protein motifs, demonstrating EDCLoc's effectiveness in capturing key sequence features.ConclusionsEDCLoc outperforms existing prediction tools in most subcellular regions and effectively mitigates class imbalance issues in multi-label classification. These advantages make EDCLoc a reliable choice for multi-label mRNA subcellular localization. The dataset and source code used in this study are available at https://github.com/DellCode233/EDCLoc.
引用
收藏
页数:17
相关论文
共 48 条
  • [31] DMLDA-LocLIFT: Identification of multi-label protein subcellular localization using DMLDA dimensionality reduction and LIFT classifier
    Zhang, Qi
    Li, Shan
    Yu, Bin
    Zhang, Qingmei
    Han, Yu
    Zhang, Yan
    Ma, Qin
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2020, 206
  • [32] Predicting subcellular localization of multisite proteins using differently weighted multi-label k-nearest neighbors sets
    Jiang, Zhongting
    Wang, Dong
    Wu, Peng
    Chen, Yuehui
    Shang, Huijie
    Wang, Luyao
    Xie, Huichun
    TECHNOLOGY AND HEALTH CARE, 2019, 27 : S185 - S193
  • [33] mRNALocater: Enhance the prediction accuracy of eukaryotic mRNA subcellular localization by using model fusion strategy
    Tang, Qiang
    Nie, Fulei
    Kang, Juanjuan
    Chen, Wei
    MOLECULAR THERAPY, 2021, 29 (08) : 2617 - 2623
  • [34] Skills prediction based on multi-label resume classification using CNN with model predictions explanation
    Jiechieu, Kameni Florentin Flambeau
    Tsopze, Norbert
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (10): : 5069 - 5087
  • [35] Skills prediction based on multi-label resume classification using CNN with model predictions explanation
    Kameni Florentin Flambeau Jiechieu
    Norbert Tsopze
    Neural Computing and Applications, 2021, 33 : 5069 - 5087
  • [36] Predicting Viral Protein Subcellular Localization with Chou's Pseudo Amino Acid Composition and Imbalance-Weighted Multi-Label K-Nearest Neighbor Algorithm
    Cao, Jun-Zhe
    Liu, Wen-Qi
    Gu, Hong
    PROTEIN AND PEPTIDE LETTERS, 2012, 19 (11): : 1163 - 1169
  • [37] R3P-Loc: A compact multi-label predictor using ridge regression and random projection for protein subcellular localization
    Wan, Shibiao
    Mak, Man-Wai
    Kung, Sun-Yuan
    JOURNAL OF THEORETICAL BIOLOGY, 2014, 360 : 34 - 45
  • [38] Improved Multi-Label Classification via a Generative Mixture Model Using Inter-Dependence Structure
    Simha, Ramanuja
    Shatkay, Hagit
    ECAI 2016: 22ND EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, 285 : 1336 - 1343
  • [39] Multi-location gram-positive and gram-negative bacterial protein subcellular localization using gene ontology and multi-label classifier ensemble
    Wang, Xiao
    Zhang, Jun
    Li, Guo-Zheng
    BMC BIOINFORMATICS, 2015, 16
  • [40] Multi-location gram-positive and gram-negative bacterial protein subcellular localization using gene ontology and multi-label classifier ensemble
    Xiao Wang
    Jun Zhang
    Guo-Zheng Li
    BMC Bioinformatics, 16