Comparative Analyses between Retained Introns and Constitutively Spliced Introns in Arabidopsis thaliana Using Random Forest and Support Vector Machine

被引:20
|
作者
Mao, Rui [1 ,2 ,3 ]
Kumar, Praveen Kumar Raj [3 ]
Guo, Cheng [3 ]
Zhang, Yang [1 ,2 ]
Liang, Chun [3 ,4 ]
机构
[1] Northwest A&F Univ, Coll Mech & Elect Engn, Yangling, Shaanxi, Peoples R China
[2] Northwest A&F Univ, Coll Informat Engn, Yangling, Shaanxi, Peoples R China
[3] Miami Univ, Dept Biol, Oxford, OH 45056 USA
[4] Miami Univ, Dept Comp Sci & Software Engn, Oxford, OH 45056 USA
来源
PLOS ONE | 2014年 / 9卷 / 08期
关键词
PARTICLE SWARM OPTIMIZATION; RNA SECONDARY STRUCTURE; FEATURE-SELECTION; REGULATORY ELEMENTS; MESSENGER-RNAS; IDENTIFICATION; CLASSIFICATION; MICROARRAY; RETENTION; COMPLEXITY;
D O I
10.1371/journal.pone.0104049
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
One of the important modes of pre-mRNA post-transcriptional modification is alternative splicing. Alternative splicing allows creation of many distinct mature mRNA transcripts from a single gene by utilizing different splice sites. In plants like Arabidopsis thaliana, the most common type of alternative splicing is intron retention. Many studies in the past focus on positional distribution of retained introns (RIs) among different genic regions and their expression regulations, while little systematic classification of RIs from constitutively spliced introns (CSIs) has been conducted using machine learning approaches. We used random forest and support vector machine (SVM) with radial basis kernel function (RBF) to differentiate these two types of introns in Arabidopsis. By comparing coordinates of introns of all annotated mRNAs from TAIR10, we obtained our high-quality experimental data. To distinguish RIs from CSIs, We investigated the unique characteristics of RIs in comparison with CSIs and finally extracted 37 quantitative features: local and global nucleotide sequence features of introns, frequent motifs, the signal strength of splice sites, and the similarity between sequences of introns and their flanking regions. We demonstrated that our proposed feature extraction approach was more accurate in effectively classifying RIs from CSIs in comparison with other four approaches. The optimal penalty parameter C and the RBF kernel parameter gamma in SVM were set based on particle swarm optimization algorithm (PSOSVM). Our classification performance showed F-Measure of 80.8% (random forest) and 77.4% (PSOSVM). Not only the basic sequence features and positional distribution characteristics of RIs were obtained, but also putative regulatory motifs in intron splicing were predicted based on our feature extraction approach. Clearly, our study will facilitate a better understanding of underlying mechanisms involved in intron retention.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] 50/50 Expressional Odds of Retention Signifies the Distinction between Retained Introns and Constitutively Spliced Introns in Arabidopsis thaliana
    Mao, Rui
    Liang, Chun
    Zhang, Yang
    Hao, Xingan
    Li, Jinyan
    FRONTIERS IN PLANT SCIENCE, 2017, 8
  • [2] Support vector machine approach for retained introns prediction using sequence features
    Xia, Huiyu
    Bi, Jianning
    Li, Yanda
    ADVANCES IN NEURAL NETWORKS - ISNN 2006, PT 3, PROCEEDINGS, 2006, 3973 : 654 - 659
  • [3] Comparison between Support Vector Machine and Random Forest for Audio Classification
    Ansari, Md Rifat
    Tumpa, Sadia Alam
    Raya, Jannat Ara Ferdouse
    Murshed, Mohammad N.
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ELECTRONICS, COMMUNICATIONS AND INFORMATION TECHNOLOGY 2021 (ICECIT 2021), 2021,
  • [4] Comparative analysis of Random Forest and Support Vector Machine for benthic habitat segmentation
    Narciso, Gilson A. M.
    Tamondong, Ayin M.
    Blanco, Ariel C.
    Nakamura, Takashi
    Nadaoka, Kazuo
    EIGHTH GEOINFORMATION SCIENCE SYMPOSIUM 2023: GEOINFORMATION SCIENCE FOR SUSTAINABLE PLANET, 2024, 12977
  • [5] FLOOD RISK MAPPING USING RANDOM FOREST AND SUPPORT VECTOR MACHINE
    Ganjirad, M.
    Delavar, M. R.
    ISPRS GEOSPATIAL CONFERENCE 2022, JOINT 6TH SENSORS AND MODELS IN PHOTOGRAMMETRY AND REMOTE SENSING, SMPR/4TH GEOSPATIAL INFORMATION RESEARCH, GIRESEARCH CONFERENCES, VOL. 10-4, 2023, : 201 - 208
  • [6] Comparison between random forest and support vector machine algorithms for LULC classification
    Avci, Cengiz
    Budak, Muhammed
    Yagmur, Nur
    Balcik, Filiz Bektas
    INTERNATIONAL JOURNAL OF ENGINEERING AND GEOSCIENCES, 2023, 8 (01): : 1 - 10
  • [7] Seepage and dam deformation analyses with statistical models: support vector regression machine and random forest
    Belmokre, Ahmed
    Mihoubi, Mustapha Kamel
    Santillan, David
    3RD INTERNATIONAL CONFERENCE ON STRUCTURAL INTEGRITY (ICSI 2019), 2019, 17 : 698 - 703
  • [8] HRFSVM: identification of fish disease using hybrid Random Forest and Support Vector Machine
    Jhansi, G.
    Sujatha, K.
    ENVIRONMENTAL MONITORING AND ASSESSMENT, 2023, 195 (08)
  • [9] HRFSVM: identification of fish disease using hybrid Random Forest and Support Vector Machine
    G. Jhansi
    K. Sujatha
    Environmental Monitoring and Assessment, 2023, 195
  • [10] Forest mapping in Peninsular Malaysia using Random Forest and Support Vector Machine Classifiers on Google Earth Engine
    Muhammad, Farah Nuralissa
    Choy, Lam Kuok
    GEOGRAFIA-MALAYSIAN JOURNAL OF SOCIETY & SPACE, 2023, 19 (03): : 1 - 16