novel feature selection based on apriori property and correlation analysis for protein sequence classification using MapReduce

被引:2
|
作者
Bhavani, R. [1 ]
Sadasivam, G. Sudha [2 ]
机构
[1] Govt Coll Technol, Dept Comp Sci & Engn, Coimbatore, Tamil Nadu, India
[2] PSG Coll Technol, Dept Comp Sci & Engn, Coimbatore, Tamil Nadu, India
关键词
apriori property; sequence classification; correlation analysis; feature subset selection; MapReduce; bioinformatics; STRUCTURAL CLASS;
D O I
10.1504/IJDMB.2017.10006248
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Feature selection is a crucial step in classification of protein sequences into existing superfamilies. Classifying protein sequences into different families based on their sequence patterns is helpful in predicting the structure and function of protein. This paper proposes a novel feature selection algorithm which first transforms the protein sequences into feature vectors and reduces the size of the feature vector based on the apriori property and correlation measure using MapReduce programming on Hadoop framework. Experimental results show that the proposed method of feature selection reduces the features by 99% and also improves accuracy by 5% to 6%.
引用
收藏
页码:255 / 265
页数:11
相关论文
共 50 条
  • [1] A Novel Technique of Feature Selection with ReliefF and CFS for Protein Sequence Classification
    Kaur, Kiranpreet
    Patil, Nagamma
    RECENT FINDINGS IN INTELLIGENT COMPUTING TECHNIQUES, VOL 1, 2019, 707 : 399 - 405
  • [2] Feature Selection and Classification of Big Data Using MapReduce Framework
    Devi, D. Renuka
    Sasikala, S.
    INTELLIGENT COMPUTING, INFORMATION AND CONTROL SYSTEMS, ICICCS 2019, 2020, 1039 : 666 - 673
  • [3] A fast and novel approach based on grouping and weighted mRMR for feature selection and classification of protein sequence data
    Kaur, Kiranpreet
    Patil, Nagamma
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2020, 23 (01) : 47 - 61
  • [4] A Novel Approach for Feature Selection Based on MapReduce for Biomarker Discovery
    Kourid, Ahlem
    Batouche, Mohamed
    INTERNATIONAL CONFERENCE ON COMPUTER VISION AND IMAGE ANALYSIS APPLICATIONS, 2015,
  • [5] Sequence-Based Classification Using Discriminatory Motif Feature Selection
    Xiong, Hao
    Capurso, Daniel
    Sen, Saunak
    Segal, Mark R.
    PLOS ONE, 2011, 6 (11):
  • [6] Optimization of sEMG Classification Model Based on Correlation Analysis and Feature Selection
    Li, Zhengzhen
    Li, Ke
    Li, Jinping
    Wei, Na
    2020 5TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM 2020), 2020, : 402 - 407
  • [7] Selection of Network Feature Attribute Based On Classification Discrimination And Correlation Analysis
    Liu, Yang
    Ma, Hongwei
    Li, Kuangdai
    Yi, Hang
    Yan, Xiaotao
    Kang, Jian
    COMPANION OF THE 2020 IEEE 20TH INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY, AND SECURITY (QRS-C 2020), 2020, : 328 - 333
  • [8] Protein sequence classification using feature hashing
    Caragea, Cornelia
    Silvescu, Adrian
    Mitra, Prasenjit
    PROTEOME SCIENCE, 2012, 10
  • [9] Protein Sequence Classification Using Feature Hashing
    Caragea, Cornelia
    Silvescu, Adrian
    Mitra, Prasenjit
    2011 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM 2011), 2011, : 538 - 543
  • [10] Protein sequence classification using feature hashing
    Cornelia Caragea
    Adrian Silvescu
    Prasenjit Mitra
    Proteome Science, 10