Comparative Analyses between Retained Introns and Constitutively Spliced Introns in Arabidopsis thaliana Using Random Forest and Support Vector Machine

被引:20
|
作者
Mao, Rui [1 ,2 ,3 ]
Kumar, Praveen Kumar Raj [3 ]
Guo, Cheng [3 ]
Zhang, Yang [1 ,2 ]
Liang, Chun [3 ,4 ]
机构
[1] Northwest A&F Univ, Coll Mech & Elect Engn, Yangling, Shaanxi, Peoples R China
[2] Northwest A&F Univ, Coll Informat Engn, Yangling, Shaanxi, Peoples R China
[3] Miami Univ, Dept Biol, Oxford, OH 45056 USA
[4] Miami Univ, Dept Comp Sci & Software Engn, Oxford, OH 45056 USA
来源
PLOS ONE | 2014年 / 9卷 / 08期
关键词
PARTICLE SWARM OPTIMIZATION; RNA SECONDARY STRUCTURE; FEATURE-SELECTION; REGULATORY ELEMENTS; MESSENGER-RNAS; IDENTIFICATION; CLASSIFICATION; MICROARRAY; RETENTION; COMPLEXITY;
D O I
10.1371/journal.pone.0104049
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
One of the important modes of pre-mRNA post-transcriptional modification is alternative splicing. Alternative splicing allows creation of many distinct mature mRNA transcripts from a single gene by utilizing different splice sites. In plants like Arabidopsis thaliana, the most common type of alternative splicing is intron retention. Many studies in the past focus on positional distribution of retained introns (RIs) among different genic regions and their expression regulations, while little systematic classification of RIs from constitutively spliced introns (CSIs) has been conducted using machine learning approaches. We used random forest and support vector machine (SVM) with radial basis kernel function (RBF) to differentiate these two types of introns in Arabidopsis. By comparing coordinates of introns of all annotated mRNAs from TAIR10, we obtained our high-quality experimental data. To distinguish RIs from CSIs, We investigated the unique characteristics of RIs in comparison with CSIs and finally extracted 37 quantitative features: local and global nucleotide sequence features of introns, frequent motifs, the signal strength of splice sites, and the similarity between sequences of introns and their flanking regions. We demonstrated that our proposed feature extraction approach was more accurate in effectively classifying RIs from CSIs in comparison with other four approaches. The optimal penalty parameter C and the RBF kernel parameter gamma in SVM were set based on particle swarm optimization algorithm (PSOSVM). Our classification performance showed F-Measure of 80.8% (random forest) and 77.4% (PSOSVM). Not only the basic sequence features and positional distribution characteristics of RIs were obtained, but also putative regulatory motifs in intron splicing were predicted based on our feature extraction approach. Clearly, our study will facilitate a better understanding of underlying mechanisms involved in intron retention.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Optimizing the performance of disease classification using nested-random forest and nested-support vector machine classifiers
    Department of Medical Informatics, Tzu Chi University, Taiwan
    J. Chem. Pharm. Res., 12 (1521-1528):
  • [32] Differences in learning characteristics between support vector machine and random forest models for compound classification revealed by Shapley value analysis
    Friederike Maite Siemers
    Jürgen Bajorath
    Scientific Reports, 13
  • [33] Differences in learning characteristics between support vector machine and random forest models for compound classification revealed by Shapley value analysis
    Siemers, Friederike Maite
    Bajorath, Juergen
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [34] Design, synthesis and experimental validation of novel potential chemopreventive agents using random forest and support vector machine binary classifiers
    Brienne Sprague
    Qian Shi
    Marlene T. Kim
    Liying Zhang
    Alexander Sedykh
    Eiichiro Ichiishi
    Harukuni Tokuda
    Kuo-Hsiung Lee
    Hao Zhu
    Journal of Computer-Aided Molecular Design, 2014, 28 : 631 - 646
  • [35] Nonlinear Methodologies for Identifying Seismic Event and Nuclear Explosion Using Random Forest, Support Vector Machine, and Naive Bayes Classification
    Dong, Longjun
    Li, Xibing
    Xie, Gongnan
    ABSTRACT AND APPLIED ANALYSIS, 2014,
  • [36] Modeling and optimizing callus growth and development in Cannabis sativa using random forest and support vector machine in combination with a genetic algorithm
    Hesami, Mohsen
    Jones, Andrew Maxwell Phineas
    APPLIED MICROBIOLOGY AND BIOTECHNOLOGY, 2021, 105 (12) : 5201 - 5212
  • [37] Modeling and optimizing callus growth and development in Cannabis sativa using random forest and support vector machine in combination with a genetic algorithm
    Mohsen Hesami
    Andrew Maxwell Phineas Jones
    Applied Microbiology and Biotechnology, 2021, 105 : 5201 - 5212
  • [38] Gas sensor array to classify the chicken meat with E. coli contaminant by using random forest and support vector machine
    Astuti, Suryani Dyah
    Tamimi, Mohammad H.
    Pradhana, Anak A.S.
    Alamsyah, Kartika A.
    Purnobasuki, Hery
    Khasanah, Miratul
    Susilo, Yunus
    Triyana, Kuwat
    Kashif, Muhammad
    Syahrom, Ardiyansyah
    Biosensors and Bioelectronics: X, 2021, 9
  • [39] Wetland conversion risk assessment of East Kolkata Wetland: A Ramsar site using random forest and support vector machine model
    Ghosh, Sasanka
    Das, Arijit
    JOURNAL OF CLEANER PRODUCTION, 2020, 275
  • [40] Design, synthesis and experimental validation of novel potential chemopreventive agents using random forest and support vector machine binary classifiers
    Sprague, Brienne
    Shi, Qian
    Kim, Marlene T.
    Zhang, Liying
    Sedykh, Alexander
    Ichiishi, Eiichiro
    Tokuda, Harukuni
    Lee, Kuo-Hsiung
    Zhu, Hao
    JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2014, 28 (06) : 631 - 646