Protein Classification Using N-gram Technique and Association Rules

被引:1
|
作者
Kabli, Fatima [1 ]
Hamou, Reda Mohamed [1 ]
Amine, Abdelmalek [1 ]
机构
[1] Tahar Moulay Univ Saida, Dept Comp Sci, GeCode Lab, Saida, Algeria
关键词
Apriori; Classification; KDD Process; N-Gram; Protein Sequences; Relevant Association Rules;
D O I
10.4018/IJSI.2018040106
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The knowledge extraction process from biological data is increasingly being considered, it addresses general issues such as grouping, classification and association; The Protein classification is an important activity for the biologist to respond to biological needs. For this reason, the authors present a global framework inspired by the knowledge extraction process from biological data to classified proteins from their primary structure based on the association rules. This framework has three main steps: The first one is, the pre-processing phase, consists of extracting descriptors by N-Gram technique. The second is the extraction of associations rules, applying the Apriori algorithm. The third step is selecting the relevant rules, and applied the classifier. The experiments of this technique were performed on five classes of protein, extracted from UniProt data bank and compared with five classification methods in the WEKA platform. The obtained results satisfied the authors' purpose to propose an effective protein classifier supported by the N-gram technique and the Apriori algorithm.
引用
收藏
页码:77 / 89
页数:13
相关论文
共 50 条
  • [41] Word N-gram Based Classification for Data Leakage Prevention
    Alneyadi, Sultan
    Sithirasenan, Elankayer
    Muthukkumarasamy, Vallipuram
    [J]. 2013 12TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2013), 2013, : 578 - 585
  • [42] Adaptable N-gram Classification Model for Data Leakage Prevention
    Alneyadi, Sultan
    Sithirasenan, Elankayer
    Muthukkumarasamy, Vallipuram
    [J]. 2013 7TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ICSPCS), 2013,
  • [43] Partitioning Based N-Gram Feature Selection for Malware Classification
    Hu, Weiwei
    Tan, Ying
    [J]. DATA MINING AND BIG DATA, DMBD 2016, 2016, 9714 : 187 - 195
  • [44] Music Genre Classification: A N-gram based Musicological Approach
    Zheng, Eve
    Moh, Melody
    Moh, Teng-Sheng
    [J]. 2017 7TH IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2017, : 671 - 677
  • [45] A machine learning approach for Arabic text classification using N-gram frequency statistics
    Khreisat, Laila
    [J]. JOURNAL OF INFORMETRICS, 2009, 3 (01) : 72 - 77
  • [46] Hybrid N-gram model using Naive Bayes for classification of political sentiments on Twitter
    Awwalu, Jamilu
    Abu Bakar, Azuraliza
    Yaakub, Mohd Ridzwan
    [J]. NEURAL COMPUTING & APPLICATIONS, 2019, 31 (12): : 9207 - 9220
  • [47] Sentiment Classification Using N-Gram Inverse Document Frequency and Automated Machine Learning
    Maipradit, Rungroj
    Hata, Hideaki
    Matsumoto, Kenichi
    [J]. IEEE SOFTWARE, 2019, 36 (05) : 65 - 70
  • [48] A Short Text Classification Method Based on N-Gram and CNN
    WANG Haitao
    HE Jie
    ZHANG Xiaohong
    LIU Shufen
    [J]. Chinese Journal of Electronics, 2020, 29 (02) : 248 - 254
  • [49] Sceadan: Using Concatenated N-Gram Vectors for Improved File and Data Type Classification
    Beebe, Nicole L.
    Maddox, Laurence A.
    Liu, Lishu
    Sun, Minghe
    [J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2013, 8 (09) : 1519 - 1530
  • [50] A Short Text Classification Method Based on N-Gram and CNN
    Wang, Haitao
    He, Jie
    Zhang, Xiaohong
    Liu, Shufen
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2020, 29 (02) : 248 - 254