Comparative Analyses between Retained Introns and Constitutively Spliced Introns in Arabidopsis thaliana Using Random Forest and Support Vector Machine

被引:20
|
作者
Mao, Rui [1 ,2 ,3 ]
Kumar, Praveen Kumar Raj [3 ]
Guo, Cheng [3 ]
Zhang, Yang [1 ,2 ]
Liang, Chun [3 ,4 ]
机构
[1] Northwest A&F Univ, Coll Mech & Elect Engn, Yangling, Shaanxi, Peoples R China
[2] Northwest A&F Univ, Coll Informat Engn, Yangling, Shaanxi, Peoples R China
[3] Miami Univ, Dept Biol, Oxford, OH 45056 USA
[4] Miami Univ, Dept Comp Sci & Software Engn, Oxford, OH 45056 USA
来源
PLOS ONE | 2014年 / 9卷 / 08期
关键词
PARTICLE SWARM OPTIMIZATION; RNA SECONDARY STRUCTURE; FEATURE-SELECTION; REGULATORY ELEMENTS; MESSENGER-RNAS; IDENTIFICATION; CLASSIFICATION; MICROARRAY; RETENTION; COMPLEXITY;
D O I
10.1371/journal.pone.0104049
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
One of the important modes of pre-mRNA post-transcriptional modification is alternative splicing. Alternative splicing allows creation of many distinct mature mRNA transcripts from a single gene by utilizing different splice sites. In plants like Arabidopsis thaliana, the most common type of alternative splicing is intron retention. Many studies in the past focus on positional distribution of retained introns (RIs) among different genic regions and their expression regulations, while little systematic classification of RIs from constitutively spliced introns (CSIs) has been conducted using machine learning approaches. We used random forest and support vector machine (SVM) with radial basis kernel function (RBF) to differentiate these two types of introns in Arabidopsis. By comparing coordinates of introns of all annotated mRNAs from TAIR10, we obtained our high-quality experimental data. To distinguish RIs from CSIs, We investigated the unique characteristics of RIs in comparison with CSIs and finally extracted 37 quantitative features: local and global nucleotide sequence features of introns, frequent motifs, the signal strength of splice sites, and the similarity between sequences of introns and their flanking regions. We demonstrated that our proposed feature extraction approach was more accurate in effectively classifying RIs from CSIs in comparison with other four approaches. The optimal penalty parameter C and the RBF kernel parameter gamma in SVM were set based on particle swarm optimization algorithm (PSOSVM). Our classification performance showed F-Measure of 80.8% (random forest) and 77.4% (PSOSVM). Not only the basic sequence features and positional distribution characteristics of RIs were obtained, but also putative regulatory motifs in intron splicing were predicted based on our feature extraction approach. Clearly, our study will facilitate a better understanding of underlying mechanisms involved in intron retention.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Comparative Analysis of Support Vector Machine, Random Forest and k-Nearest Neighbor Classifiers for Predicting Remaining Usage Life of Roller Bearings
    Palaniappan R.
    Informatica (Slovenia), 2024, 48 (07): : 39 - 52
  • [42] Recognition of gasoline in fire debris using machine learning: Part I, application of random forest, gradient boosting, support vector machine, and naive bayes
    Bogdal, C.
    Schellenberg, R.
    Hoepli, O.
    Bovens, M.
    Lory, M.
    FORENSIC SCIENCE INTERNATIONAL, 2022, 331
  • [43] Modeling of Wire Electro-Spark Machining of Inconel 690 Superalloy Using Support Vector Machine and Random Forest Regression Approaches
    Raj, Atul
    Misra, Joy Prakash
    Khanduja, Dinesh
    JOURNAL OF ADVANCED MANUFACTURING SYSTEMS, 2022, 21 (03) : 557 - 571
  • [44] Per-field crop classification in irrigated agricultural regions in middle Asia using random forest and support vector machine ensemble
    Loew, Fabian
    Schorcht, Gunther
    Michel, Ulrich
    Dech, Stefan
    Conrad, Christopher
    EARTH RESOURCES AND ENVIRONMENTAL REMOTE SENSING/GIS APPLICATIONS III, 2012, 8538
  • [45] Modeling color fading ozonation of reactive-dyed cotton using the Extreme Learning Machine, Support Vector Regression and Random Forest
    He, Zhenglei
    Kim-Phuc Tran
    Thomassey, Sebastien
    Zeng, Xianyi
    Xu, Jie
    Yi Changhai
    TEXTILE RESEARCH JOURNAL, 2020, 90 (7-8) : 896 - 908
  • [46] Ionic surfactants critical micelle concentration modelling in water/organic solvent mixtures using random forest and support vector machine algorithms
    Soria-Lopez, Anton
    Garcia-Marti, Maria
    Mejuto, Juan C.
    TENSIDE SURFACTANTS DETERGENTS, 2025, 62 (01) : 8 - 18
  • [47] Diabetes Prediction using Decision Tree, Random Forest, Support Vector Machine, K- Nearest Neighbors, Logistic Regression Classifiers
    Peerbasha, S.
    Raja, A. Saleem
    Praveen, K. P.
    Iqbal, Y. Mohammed
    Surputheen, Mohamed
    JOURNAL OF ADVANCED APPLIED SCIENTIFIC RESEARCH, 2023, 5 (04): : 42 - 54
  • [48] Intracranial Hemorrhage Detection in Head CT Using Double-Branch Convolutional Neural Network, Support Vector Machine, and Random Forest
    Sage, Agata
    Badura, Pawel
    APPLIED SCIENCES-BASEL, 2020, 10 (21): : 1 - 13
  • [49] Comprehensive quality assessment of Dendrubium officinale using ATR-FTIR spectroscopy combined with random forest and support vector machine regression
    Wang, Ye
    Huang, Heng-Yu
    Zuo, Zhi-Tian
    Wang, Yuan-Zhong
    SPECTROCHIMICA ACTA PART A-MOLECULAR AND BIOMOLECULAR SPECTROSCOPY, 2018, 205 : 637 - 648
  • [50] Classification of nanofluids solutions based on viscosity values: A comparative study of random forest, logistic model tree, Bayesian network, and support vector machine models
    Mohammadi, Mahsa
    Khorrami, Mohammadreza Khanmohammadi
    Ghasemzadeh, Hossein
    INFRARED PHYSICS & TECHNOLOGY, 2022, 125