Effective Feature Selection for Classification of Promoter Sequences

被引:5
|
作者
Kouser, K. [1 ]
Lavanya, P. G. [1 ]
Rangarajan, Lalitha [1 ]
Kshitish, Acharya K. [2 ,3 ]
机构
[1] DoS Comp Sci, Mysore, Karnataka, India
[2] IBAB, Biotech Pk, Bangalore, Karnataka, India
[3] Shodhaka Life Sci Pvt Ltd, IBAB, Biotech Pk, Bangalore, Karnataka, India
来源
PLOS ONE | 2016年 / 11卷 / 12期
关键词
GENE-EXPRESSION; DNA; PREDICTION; MODES;
D O I
10.1371/journal.pone.0167165
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM) features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine), KNN (K Nearest Neighbor) and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method) but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] Effective Automated Feature Construction and Selection for Classification of Biological Sequences
    Kamath, Uday
    De Jong, Kenneth
    Shehu, Amarda
    [J]. PLOS ONE, 2014, 9 (07):
  • [2] Effective feature selection technique for text classification
    Seetha, Hari
    Murty, M. Narasimha
    Saravanan, R.
    [J]. INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2015, 7 (03) : 165 - 184
  • [3] Effective feature selection using feature vector graph for classification
    Zhao, Guodong
    Wu, Yan
    Chen, Fuqiang
    Zhang, Junming
    Bai, Jing
    [J]. NEUROCOMPUTING, 2015, 151 : 376 - 389
  • [4] Improved Feature Selection Algorithm for Biological Sequences Classification
    Guannoni, Naoual
    Mhamdi, Faouzi
    Elloumi, Mourad
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2019, PT I, 2019, 11775 : 689 - 700
  • [5] Effective Text Classification by a Supervised Feature Selection Approach
    Basu, Tanmay
    Murthy, C. A.
    [J]. 12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012), 2012, : 918 - 925
  • [6] mRMR plus : An Effective Feature Selection Algorithm for Classification
    Chowdhury, Hussain A.
    Bhattacharyya, Dhruba K.
    [J]. PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2017, 2017, 10597 : 424 - 430
  • [7] Effective classification using feature selection and fuzzy integration
    Pizzi, Nick J.
    Pedrycz, Witold
    [J]. FUZZY SETS AND SYSTEMS, 2008, 159 (21) : 2859 - 2872
  • [8] An Improved Feature Selection Based on Effective Range for Classification
    Wang, Jianzhong
    Zhou, Shuang
    Yi, Yugen
    Kong, Jun
    [J]. SCIENTIFIC WORLD JOURNAL, 2014,
  • [9] THE EFFECT OF REPRESENTATIVE TRAINING DATASET SELECTION ON THE CLASSIFICATION PERFORMANCE OF THE PROMOTER SEQUENCES
    Yaman, Ayse Gul
    Can, Tolga
    [J]. PROCEEDINGS OF THE 6TH INTERNATIONAL SYMPOSIUM ON HEALTH INFORMATICS AND BIOINFORMATICS (HIBIT'11), 2011, : 55 - 58
  • [10] Effective Feature Selection for Multi-class Classification Models
    Lin, Hung-Yi
    [J]. WORLD CONGRESS ON ENGINEERING - WCE 2013, VOL III, 2013, : 1474 - 1479