Effective Feature Selection for Classification of Promoter Sequences

被引:5
|
作者
Kouser, K. [1 ]
Lavanya, P. G. [1 ]
Rangarajan, Lalitha [1 ]
Kshitish, Acharya K. [2 ,3 ]
机构
[1] DoS Comp Sci, Mysore, Karnataka, India
[2] IBAB, Biotech Pk, Bangalore, Karnataka, India
[3] Shodhaka Life Sci Pvt Ltd, IBAB, Biotech Pk, Bangalore, Karnataka, India
来源
PLOS ONE | 2016年 / 11卷 / 12期
关键词
GENE-EXPRESSION; DNA; PREDICTION; MODES;
D O I
10.1371/journal.pone.0167165
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM) features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine), KNN (K Nearest Neighbor) and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method) but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.
引用
收藏
页数:20
相关论文
共 50 条
  • [41] An effective feature selection method using the contribution likelihood ratio of attributes for classification
    Zhang, Zhiwang
    Shi, Yong
    Gao, Guangxia
    Chai, Yaohui
    [J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2008, 4977 : 165 - 171
  • [42] TCellR2Vec: efficient feature selection for TCR sequences for cancer classification
    Tayebi, Zahra
    Ali, Sarwan
    Patterson, Murray
    [J]. PeerJ Computer Science, 2024, 10
  • [43] Feature selection with effective distance
    Liu, Mingxia
    Zhang, Daoqiang
    [J]. NEUROCOMPUTING, 2016, 215 : 100 - 109
  • [44] Feature selection and effective classifiers
    Deogun, JS
    Choubey, SK
    Raghavan, VV
    Sever, H
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1998, 49 (05): : 423 - 434
  • [45] Feature Selection for Collective Classification
    Senliol, Baris
    Aral, Atakan
    Cataltepe, Zehra
    [J]. 2009 24TH INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2009, : 285 - 290
  • [46] Feature Selection for Classification with QAOA
    Turati, Gloria
    Dacrema, Maurizio Ferrari
    Cremonesi, Paolo
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON QUANTUM COMPUTING AND ENGINEERING (QCE 2022), 2022, : 782 - 785
  • [47] ONLINE FEATURE SELECTION AND CLASSIFICATION
    Kalkan, Habil
    Cetisli, Bayram
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 2124 - 2127
  • [48] Feature Selection for Twitter Classification
    Ostrowski, David Alfred
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2014, : 267 - 272
  • [49] Feature Selection in Text Classification
    Sahin, Durmus Ozkan
    Ates, Nurullah
    Kilic, Erdal
    [J]. 2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1777 - 1780
  • [50] Sequential Feature Selection for Classification
    Rueckstiess, Thomas
    Osendorfer, Christian
    van der Smagt, Patrick
    [J]. AI 2011: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 7106 : 132 - +