miProBERT: identification of microRNA promoters based on the pre-trained model BERT

被引：2

作者：

Wang, Xin ^{[1
]}

Gao, Xin ^{[2
,3
,4
]}

Wang, Guohua ^{[1
]}

Li, Dan ^{[5
]}

机构：

[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China

[2] King Abdullah Univ Sci & Technol KAUST, Comp Sci, Thuwal, Saudi Arabia

[3] KAUST, Computat Biosci Res Ctr CBRC, Thuwal, Saudi Arabia

[4] KAUST, Smart Hlth Initiat SHI, Thuwal, Saudi Arabia

[5] Northeast Forestry Univ, Coll Informat & Comp Engn, Harbin, Peoples R China

来源：

BRIEFINGS IN BIOINFORMATICS | 2023年 / 24卷 / 03期

基金：

中国国家自然科学基金;

关键词：

deep learning; ncRNA; microRNA promoter; BERT; GENOME-WIDE ANALYSIS; MECHANISM; COMPLEX; MIRNAS;

D O I：

10.1093/bib/bbad093

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Accurate prediction of promoter regions driving miRNA gene expression has become a major challenge due to the lack of annotation information for pri-miRNA transcripts. This defect hinders our understanding of miRNA-mediated regulatory networks. Some algorithms have been designed during the past decade to detect miRNA promoters. However, these methods rely on biosignal data such as CpG islands and still need to be improved. Here, we propose miProBERT, a BERT-based model for predicting promoters directly from gene sequences without using any structural or biological signals. According to our information, it is the first time a BERT-based model has been employed to identify miRNA promoters. We use the pre-trained model DNABERT, fine-tune the pre-trained model on the gene promoter dataset so that the model includes information about the richer biological properties of promoter sequences in its representation, and then systematically scan the upstream regions of each intergenic miRNA using the fine-tuned model. About, 665 miRNA promoters are found. The innovative use of a random substitution strategy to construct a negative dataset improves the discriminative ability of the model and further reduces the false positive rate (FPR) to as low as 0.0421. On independent datasets, miProBERT outperformed other gene promoter prediction methods. With comparison on 33 experimentally validated miRNA promoter datasets, miProBERT significantly outperformed previously developed miRNA promoter prediction programs with 78.13% precision and 75.76% recall. We further verify the predicted promoter regions by analyzing conservation, CpG content and histone marks. The effectiveness and robustness of miProBERT are highlighted.

引用

页数：10

共 50 条

[1] Patent classification with pre-trained Bert model
Kahraman, Selen Yuecesoy
Durmusoglu, Alptekin
Dereli, Tuerkay
[J]. JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, 2024, 39 (04): : 2485 - 2496
[2] Research on Chinese Intent Recognition Based on BERT pre-trained model
Zhang, Pan
Huang, Li
[J]. 2020 5TH INTERNATIONAL CONFERENCE ON MATHEMATICS AND ARTIFICIAL INTELLIGENCE (ICMAI 2020), 2020, : 128 - 132
[3] BERT-siRNA: siRNA target prediction based on BERT pre-trained interpretable model
Xu, Jiayu
Xu, Nan
Xie, Weixin
Zhao, Chengkui
Yu, Lei
Feng, Weixing
[J]. GENE, 2024, 910
[4] Chinese Grammatical Correction Using BERT-based Pre-trained Model
Wang, Hongfei
Kurosawa, Michiki
Katsumatat, Satoru
Komachi, Mamoru
[J]. 1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 163 - 168
[5] Leveraging Pre-trained BERT for Audio Captioning
Liu, Xubo
Mei, Xinhao
Huang, Qiushi
Sun, Jianyuan
Zhao, Jinzheng
Liu, Haohe
Plumbley, Mark D.
Kilic, Volkan
Wang, Wenwu
[J]. 2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 1145 - 1149
[6] Inference-based No-Learning Approach on Pre-trained BERT Model Retrieval
Pham, Huu-Long
Mibayashi, Ryota
Yamamoto, Takehiro
Kato, Makoto P.
Yamamoto, Yusuke
Shoji, Yoshiyuki
Ohshima, Hiroaki
[J]. 2024 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, IEEE BIGCOMP 2024, 2024, : 234 - 241
[7] BTLink : automatic link recovery between issues and commits based on pre-trained BERT model
Lan, Jinpeng
Gong, Lina
Zhang, Jingxuan
Zhang, Haoxiang
[J]. EMPIRICAL SOFTWARE ENGINEERING, 2023, 28 (04)
[8] BTLink : automatic link recovery between issues and commits based on pre-trained BERT model
Jinpeng Lan
Lina Gong
Jingxuan Zhang
Haoxiang Zhang
[J]. Empirical Software Engineering, 2023, 28
[9] Smart Edge-based Fake News Detection using Pre-trained BERT Model
Guo, Yuhang
Lamaazi, Hanane
Mizouni, Rabeb
[J]. 2022 18TH INTERNATIONAL CONFERENCE ON WIRELESS AND MOBILE COMPUTING, NETWORKING AND COMMUNICATIONS (WIMOB), 2022,
[10] BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model
Chen, Song
Liao, Hai
[J]. APPLIED ARTIFICIAL INTELLIGENCE, 2022, 36 (01)

← 1 2 3 4 5 →