Nmix: a hybrid deep learning model for precise prediction of 2'-O-methylation sites based on multi-feature fusion and ensemble learning

被引:0
|
作者
Geng, Yu-Qing [1 ]
Lai, Fei-Liao [1 ]
Luo, Hao [1 ]
Gao, Feng [1 ,2 ,3 ,4 ]
机构
[1] Tianjin Univ, Sch Sci, Dept Phys, 92 Weijin Rd, Tianjin 300072, Peoples R China
[2] Tianjin Univ, Frontiers Sci Ctr Synthet Biol, 92 Weijin Rd, Tianjin 300072, Peoples R China
[3] Tianjin Univ, Key Lab Syst Bioengn, Minist Educ, 92 Weijin Rd, Tianjin 300072, Peoples R China
[4] Collaborat Innovat Ctr Chem Sci & Engn Tianjin, SynBio Res Platform, 92 Weijin Rd, Tianjin 300072, Peoples R China
基金
中国国家自然科学基金;
关键词
2'-O-methylation; multi-feature fusion; deep learning; asymmetric loss; ensemble learning; HIGH-THROUGHPUT; MESSENGER-RNA; IDENTIFICATION; LANDSCAPE;
D O I
10.1093/bib/bbae601
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
RNA 2'-O-methylation (Nm) is a crucial post-transcriptional modification with significant biological implications. However, experimental identification of Nm sites is challenging and resource-intensive. While multiple computational tools have been developed to identify Nm sites, their predictive performance, particularly in terms of precision and generalization capability, remains deficient. We introduced Nmix, an advanced computational tool for precise prediction of Nm sites in human RNA. We constructed the largest, low-redundancy dataset of experimentally verified Nm sites and employed an innovative multi-feature fusion approach, combining one-hot, Z-curve and RNA secondary structure encoding. Nmix utilizes a meticulously designed hybrid deep learning architecture, integrating 1D/2D convolutional neural networks, self-attention mechanism and residual connection. We implemented asymmetric loss function and Bayesian optimization-based ensemble learning, substantially improving predictive performance on imbalanced datasets. Rigorous testing on two benchmark datasets revealed that Nmix significantly outperforms existing state-of-the-art methods across various metrics, particularly in precision, with average improvements of 33.1% and 60.0%, and Matthews correlation coefficient, with average improvements of 24.7% and 51.1%. Notably, Nmix demonstrated exceptional cross-species generalization capability, accurately predicting 93.8% of experimentally verified Nm sites in rat RNA. We also developed a user-friendly web server (https://tubic.org/Nm) and provided standalone prediction scripts to facilitate widespread adoption. We hope that by providing a more accurate and robust tool for Nm site prediction, we can contribute to advancing our understanding of Nm mechanisms and potentially benefit the prediction of other RNA modification sites.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Time series-based hybrid ensemble learning model with multivariate multidimensional feature coding for DNA methylation prediction
    Wu, Yan
    Li, Tan
    Li, Mengshan
    Zhou, Weihong
    Sheng, Sheng
    Wang, Jun
    Wu, Fu-an
    BMC GENOMICS, 2023, 24 (01)
  • [42] DNA Binding Protein Prediction based on Multi-feature Deep Meta-transfer Learning
    Wang, Chunliang
    Kong, Fanfan
    Wang, Yu
    Wu, Hongjie
    Yan, Jun
    CURRENT BIOINFORMATICS, 2024,
  • [43] Implementation of multimodal biometric recognition via multi-feature deep learning networks and feature fusion
    Tiong, Leslie Ching Ow
    Kim, Seong Tae
    Ro, Yong Man
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (16) : 22743 - 22772
  • [44] Implementation of multimodal biometric recognition via multi-feature deep learning networks and feature fusion
    Leslie Ching Ow Tiong
    Seong Tae Kim
    Yong Man Ro
    Multimedia Tools and Applications, 2019, 78 : 22743 - 22772
  • [45] Unknown Traffic Recognition Based on Multi-Feature Fusion and Incremental Learning
    Liu, Junyi
    Wang, Jiarong
    Yan, Tian
    Qi, Fazhi
    Chen, Gang
    APPLIED SCIENCES-BASEL, 2023, 13 (13):
  • [46] O-GlcNAcPRED-DL: Prediction of Protein O-GlcNAcylation Sites Based on an Ensemble Model of Deep Learning
    Hu, Fengzhu
    Li, Weiyu
    Li, Yaoxiang
    Hou, Chunyan
    Ma, Junfeng
    Jia, Cangzhi
    JOURNAL OF PROTEOME RESEARCH, 2023, 23 (01) : 95 - 106
  • [47] Malware Detection Using Contrastive Learning Based on Multi-Feature Fusion
    Guo, Kailu
    Xin, Yang
    Yu, Tianxiang
    2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023, 2024, : 1681 - 1686
  • [48] Malicious URL Recognition Based on Multi-feature Fusion and Machine Learning
    Ma, Changyou
    Wu, Aimin
    Ma, Wenzhuo
    Chen, Ke
    Liu, Yun
    Liang, Xiaoning
    2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 3014 - 3019
  • [49] Malware Detection based on Dynamic Multi-feature using Ensemble Learning at Hypervisor
    Zhang, Jian
    Gao, Cheng
    Gong, Liangyi
    Gu, Zhaojun
    Man, Dapeng
    Yang, Wu
    Du, Xiaojiang
    2018 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2018,
  • [50] A Malicious Code Static Detection Framework Based on Multi-Feature Ensemble Learning
    Yang W.
    Gao M.
    Jiang T.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2021, 58 (05): : 1021 - 1034