Comparative Analyses between Retained Introns and Constitutively Spliced Introns in Arabidopsis thaliana Using Random Forest and Support Vector Machine

被引:20
|
作者
Mao, Rui [1 ,2 ,3 ]
Kumar, Praveen Kumar Raj [3 ]
Guo, Cheng [3 ]
Zhang, Yang [1 ,2 ]
Liang, Chun [3 ,4 ]
机构
[1] Northwest A&F Univ, Coll Mech & Elect Engn, Yangling, Shaanxi, Peoples R China
[2] Northwest A&F Univ, Coll Informat Engn, Yangling, Shaanxi, Peoples R China
[3] Miami Univ, Dept Biol, Oxford, OH 45056 USA
[4] Miami Univ, Dept Comp Sci & Software Engn, Oxford, OH 45056 USA
来源
PLOS ONE | 2014年 / 9卷 / 08期
关键词
PARTICLE SWARM OPTIMIZATION; RNA SECONDARY STRUCTURE; FEATURE-SELECTION; REGULATORY ELEMENTS; MESSENGER-RNAS; IDENTIFICATION; CLASSIFICATION; MICROARRAY; RETENTION; COMPLEXITY;
D O I
10.1371/journal.pone.0104049
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
One of the important modes of pre-mRNA post-transcriptional modification is alternative splicing. Alternative splicing allows creation of many distinct mature mRNA transcripts from a single gene by utilizing different splice sites. In plants like Arabidopsis thaliana, the most common type of alternative splicing is intron retention. Many studies in the past focus on positional distribution of retained introns (RIs) among different genic regions and their expression regulations, while little systematic classification of RIs from constitutively spliced introns (CSIs) has been conducted using machine learning approaches. We used random forest and support vector machine (SVM) with radial basis kernel function (RBF) to differentiate these two types of introns in Arabidopsis. By comparing coordinates of introns of all annotated mRNAs from TAIR10, we obtained our high-quality experimental data. To distinguish RIs from CSIs, We investigated the unique characteristics of RIs in comparison with CSIs and finally extracted 37 quantitative features: local and global nucleotide sequence features of introns, frequent motifs, the signal strength of splice sites, and the similarity between sequences of introns and their flanking regions. We demonstrated that our proposed feature extraction approach was more accurate in effectively classifying RIs from CSIs in comparison with other four approaches. The optimal penalty parameter C and the RBF kernel parameter gamma in SVM were set based on particle swarm optimization algorithm (PSOSVM). Our classification performance showed F-Measure of 80.8% (random forest) and 77.4% (PSOSVM). Not only the basic sequence features and positional distribution characteristics of RIs were obtained, but also putative regulatory motifs in intron splicing were predicted based on our feature extraction approach. Clearly, our study will facilitate a better understanding of underlying mechanisms involved in intron retention.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Predicting Calcein Release from Ultrasound-Targeted Liposomes: A Comparative Analysis of Random Forest and Support Vector Machine
    Shomope, Ibrahim
    Percival, Kelly M.
    Jabbar, Nabil M. Abdel
    Husseini, Ghaleb A.
    TECHNOLOGY IN CANCER RESEARCH & TREATMENT, 2024, 23
  • [22] Benthic Habitat Mapping on Different Coral Reef Types Using Random Forest and Support Vector Machine Algorithm
    Zhafarina, Zhafirah
    Wicaksono, Pramaditya
    SIXTH INTERNATIONAL SYMPOSIUM ON LAPAN-IPB SATELLITE (LISAT 2019), 2019, 11372
  • [23] Implementing a Network Intrusion Detection System Using Semi-supervised Support Vector Machine and Random Forest
    Shah, Sandeep
    Muhuri, Pramita Sree
    Yuan, Xiaohong
    Roy, Kaushik
    Chatterjee, Prosenjit
    ACMSE 2021: PROCEEDINGS OF THE 2021 ACM SOUTHEAST CONFERENCE, 2021, : 180 - 184
  • [24] A hybrid colour model based land cover classification using random forest and support vector machine classifiers
    Rama, M. Christy
    Mahendran, D. S.
    Kumar, T. C. Raja
    INTERNATIONAL JOURNAL OF APPLIED PATTERN RECOGNITION, 2018, 5 (02) : 87 - 100
  • [25] Automated Comparative Predictive Analysis of Deception Detection in Convicted Offenders Using Polygraph with Random Forest, Support Vector Machine, and Artificial Neural Network Models
    Rad, Dana
    Kiss, Csaba
    Paraschiv, Nicolae
    Balas, Valentina Emilia
    STUDIES IN INFORMATICS AND CONTROL, 2024, 33 (03):
  • [26] Comparative performance of convolutional neural network, weighted and conventional support vector machine and random forest for classifying tree species using hyperspectral and photogrammetric data
    Sothe, C.
    De Almeida, C. M.
    Schimalski, M. B.
    La Rosa, L. E. C.
    Castro, J. D. B.
    Feitosa, R. Q.
    Dalponte, M.
    Lima, C. L.
    Liesenberg, V.
    Miyoshi, G. T.
    Tommaselli, A. M. G.
    GISCIENCE & REMOTE SENSING, 2020, 57 (03) : 369 - 394
  • [27] Comparison of Support Vector Machine and Random Forest Algorithms for Invasive and Expansive Species Classification Using Airborne Hyperspectral Data
    Sabat-Tomala, Anita
    Raczko, Edwin
    Zagajewski, Bogdan
    REMOTE SENSING, 2020, 12 (03)
  • [28] Improved Accuracy in Heart Disease Prediction using Novel Random Forest Algorithm in Comparison with Support Vector Machine Algorithm
    Poojitha, T.
    Mahaveerakannan, R.
    CARDIOMETRY, 2022, (25): : 1546 - 1553
  • [29] Improving the Efficiency of Heart Disease Prediction Using Novel Random Forest Classifier Over Support Vector Machine Algorithm
    Teja, P. Prasanna Sai
    Veeramani, T.
    CARDIOMETRY, 2022, (25): : 1468 - 1476
  • [30] Fusion of Ultraviolet and Infrared Spectra Using Support Vector Machine and Random Forest Models for the Discrimination of Wild and Cultivated Mushrooms
    Yao, Sen
    Li, Jie-Qing
    Duan, Zhi-Li
    Li, Tao
    Wang, Yuan-Zhong
    ANALYTICAL LETTERS, 2020, 53 (07) : 1019 - 1033