FEATURES SELECTION USING PARAMETRIC AND NON-PARAMETRIC METHODS: TAG SNPs SELECTION USING GA-SVM AND GA-KNN

被引:1
|
作者
Elatraby, Amr I. A. [1 ]
Wahba, Rashad R. T. [1 ]
机构
[1] Ain Shams Univ, Fac Commerce, Stat Math & Insurance Dept, Cairo, Egypt
关键词
Single Nucleotide Polymorphisms (SNPs); tag SNPs; Support Vector Machine (SVM); K-Nearest Neighbor (KNN); Genetic Algorithm (GA);
D O I
10.17654/ADASMay2015_105_123
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The study of genetic variations of the human genome, especially Single Nucleotide Polymorphisms (SNPs), can lead to the discovery of new methods to prevent, diagnose and treat diseases. Full examination of all the SNPs of the human genome has become too expensive, thus a small subset of informative SNPs called tag SNPs must be selected. In this study, two methods for the selection of tag SNPs are presented. The first method is called GA-SVM, which integrates the Support Vector Machine (SVM) as a parametric technique with the Genetic Algorithm (GA). The second method is called GA-KNN, which integrates the K-Nearest Neighbor (KNN) as a non-parametric technique with GA. The two methods are tested on a group of genes, which known to be related to the natural clearance of Hepatitis C Virus (HCV). The genes' SNPs data had extracted from the HapMap site (http://hapmap.org). Moreover, the prediction accuracy of each method has been evaluated by using the 10-Fold Cross Validation (10-FCV) method. Our results have showed that, although the prediction accuracy of GA-SVM outperforms the prediction accuracy of GA-KNN when selecting a very small number of tag SNPs, the prediction accuracy of GA-KNN outperforms GA-SVM in all other cases. In addition, our results have indicated that the GA-KNN method requires more computing time as compared with GA-SVM.
引用
收藏
页码:105 / 123
页数:19
相关论文
共 50 条
  • [41] Decomposition of stochastic properties within images using non-parametric methods
    Hetzheim, H
    Dooley, LS
    ICSP '96 - 1996 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1996, : 1142 - 1145
  • [42] Comparative study of different non-parametric genomic selection methods under diverse genetic architecture
    Budhlakoti, Neeraj
    Rai, Anil
    Mishra, D. C.
    Jaggi, Seema
    Kumar, Mukesh
    Rao, A. R.
    INDIAN JOURNAL OF GENETICS AND PLANT BREEDING, 2020, 80 (04) : 395 - 401
  • [43] New insights into genomic selection through population-based non-parametric prediction methods
    Lima, Leisa Pires
    Azevedo, Camila Ferreira
    Vilela de Resende, Marcos Deon
    Fonseca e Silva, Fabyano
    Suela, Matheus Massariol
    Nascimento, Moyses
    Soriano Viana, Jose Marcelo
    SCIENTIA AGRICOLA, 2019, 76 (04): : 290 - 298
  • [44] Flexible Expected Shortfall Estimation Using Parametric & Non-Parametric Methods with Applications in Finance, Insurance & Climatology
    Guharay, Sabyasachi
    Chang, K. C.
    Xu, Jie
    2019 22ND INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION 2019), 2019,
  • [45] Investigation of rangeland indicator species using parametric and non-parametric methods in hilly landscapes of central Iran
    Sheikhzadeh, Asieh
    Bashari, Hossein
    Tarkesh Esfahani, Mostafa
    Matinkhah, SeyedHamid
    Soleimani, Mohsen
    JOURNAL OF MOUNTAIN SCIENCE, 2019, 16 (06) : 1408 - 1418
  • [46] Investigation of rangeland indicator species using parametric and non-parametric methods in hilly landscapes of central Iran
    Asiyeh SHEIKHZADEH
    Hossein BASHARI
    Mostafa TARKESH ESFAHANI
    SayedHamid MATINKHAH
    Mohsen SOLEIMANI
    JournalofMountainScience, 2019, 16 (06) : 1408 - 1418
  • [47] Investigation of rangeland indicator species using parametric and non-parametric methods in hilly landscapes of central Iran
    Asieh Sheikhzadeh
    Hossein Bashari
    Mostafa Tarkesh Esfahani
    SeyedHamid Matinkhah
    Mohsen Soleimani
    Journal of Mountain Science, 2019, 16 : 1408 - 1418
  • [48] Selection of evolutionary models for phylogenetic hypothesis testing using parametric methods
    Emerson, BC
    Ibrahim, KM
    Hewitt, GM
    JOURNAL OF EVOLUTIONARY BIOLOGY, 2001, 14 (04) : 620 - 631
  • [49] Feature selection and classification of multi-parametric medical images using bagging and SVM
    Fan, Yong
    Resnick, Susan M.
    Davatzikos, Christos
    MEDICAL IMAGING 2008: IMAGE PROCESSING, PTS 1-3, 2008, 6914
  • [50] Computational identification of human long intergenic non-coding RNAs using a GA-SVM algorithm
    Wang, Yanqiu
    Li, Yang
    Wang, Qi
    Lv, Yingli
    Wang, Shiyuan
    Chen, Xi
    Yu, Xuexin
    Jiang, Wei
    Li, Xia
    GENE, 2014, 533 (01) : 94 - 99