miProBERT: identification of microRNA promoters based on the pre-trained model BERT

被引:2
|
作者
Wang, Xin [1 ]
Gao, Xin [2 ,3 ,4 ]
Wang, Guohua [1 ]
Li, Dan [5 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China
[2] King Abdullah Univ Sci & Technol KAUST, Comp Sci, Thuwal, Saudi Arabia
[3] KAUST, Computat Biosci Res Ctr CBRC, Thuwal, Saudi Arabia
[4] KAUST, Smart Hlth Initiat SHI, Thuwal, Saudi Arabia
[5] Northeast Forestry Univ, Coll Informat & Comp Engn, Harbin, Peoples R China
基金
中国国家自然科学基金;
关键词
deep learning; ncRNA; microRNA promoter; BERT; GENOME-WIDE ANALYSIS; MECHANISM; COMPLEX; MIRNAS;
D O I
10.1093/bib/bbad093
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Accurate prediction of promoter regions driving miRNA gene expression has become a major challenge due to the lack of annotation information for pri-miRNA transcripts. This defect hinders our understanding of miRNA-mediated regulatory networks. Some algorithms have been designed during the past decade to detect miRNA promoters. However, these methods rely on biosignal data such as CpG islands and still need to be improved. Here, we propose miProBERT, a BERT-based model for predicting promoters directly from gene sequences without using any structural or biological signals. According to our information, it is the first time a BERT-based model has been employed to identify miRNA promoters. We use the pre-trained model DNABERT, fine-tune the pre-trained model on the gene promoter dataset so that the model includes information about the richer biological properties of promoter sequences in its representation, and then systematically scan the upstream regions of each intergenic miRNA using the fine-tuned model. About, 665 miRNA promoters are found. The innovative use of a random substitution strategy to construct a negative dataset improves the discriminative ability of the model and further reduces the false positive rate (FPR) to as low as 0.0421. On independent datasets, miProBERT outperformed other gene promoter prediction methods. With comparison on 33 experimentally validated miRNA promoter datasets, miProBERT significantly outperformed previously developed miRNA promoter prediction programs with 78.13% precision and 75.76% recall. We further verify the predicted promoter regions by analyzing conservation, CpG content and histone marks. The effectiveness and robustness of miProBERT are highlighted.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Patent classification with pre-trained Bert model
    Kahraman, Selen Yuecesoy
    Durmusoglu, Alptekin
    Dereli, Tuerkay
    [J]. JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, 2024, 39 (04): : 2485 - 2496
  • [2] Research on Chinese Intent Recognition Based on BERT pre-trained model
    Zhang, Pan
    Huang, Li
    [J]. 2020 5TH INTERNATIONAL CONFERENCE ON MATHEMATICS AND ARTIFICIAL INTELLIGENCE (ICMAI 2020), 2020, : 128 - 132
  • [3] BERT-siRNA: siRNA target prediction based on BERT pre-trained interpretable model
    Xu, Jiayu
    Xu, Nan
    Xie, Weixin
    Zhao, Chengkui
    Yu, Lei
    Feng, Weixing
    [J]. GENE, 2024, 910
  • [4] Chinese Grammatical Correction Using BERT-based Pre-trained Model
    Wang, Hongfei
    Kurosawa, Michiki
    Katsumatat, Satoru
    Komachi, Mamoru
    [J]. 1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 163 - 168
  • [5] Leveraging Pre-trained BERT for Audio Captioning
    Liu, Xubo
    Mei, Xinhao
    Huang, Qiushi
    Sun, Jianyuan
    Zhao, Jinzheng
    Liu, Haohe
    Plumbley, Mark D.
    Kilic, Volkan
    Wang, Wenwu
    [J]. 2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 1145 - 1149
  • [6] Inference-based No-Learning Approach on Pre-trained BERT Model Retrieval
    Pham, Huu-Long
    Mibayashi, Ryota
    Yamamoto, Takehiro
    Kato, Makoto P.
    Yamamoto, Yusuke
    Shoji, Yoshiyuki
    Ohshima, Hiroaki
    [J]. 2024 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, IEEE BIGCOMP 2024, 2024, : 234 - 241
  • [7] BTLink : automatic link recovery between issues and commits based on pre-trained BERT model
    Lan, Jinpeng
    Gong, Lina
    Zhang, Jingxuan
    Zhang, Haoxiang
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2023, 28 (04)
  • [8] BTLink : automatic link recovery between issues and commits based on pre-trained BERT model
    Jinpeng Lan
    Lina Gong
    Jingxuan Zhang
    Haoxiang Zhang
    [J]. Empirical Software Engineering, 2023, 28
  • [9] Smart Edge-based Fake News Detection using Pre-trained BERT Model
    Guo, Yuhang
    Lamaazi, Hanane
    Mizouni, Rabeb
    [J]. 2022 18TH INTERNATIONAL CONFERENCE ON WIRELESS AND MOBILE COMPUTING, NETWORKING AND COMMUNICATIONS (WIMOB), 2022,
  • [10] BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model
    Chen, Song
    Liao, Hai
    [J]. APPLIED ARTIFICIAL INTELLIGENCE, 2022, 36 (01)