A New N-gram Feature Extraction-Selection Method for Malicious Code

被引:0
|
作者
Parvin, Hamid [1 ]
Minaei, Behrouz [1 ]
Karshenas, Hossein [1 ]
Beigi, Akram [1 ]
机构
[1] Iran Univ Sci & Technol, Sch Comp Engn, Tehran, Iran
关键词
Malicious Code; N-gram Analysis; Feature Selection;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
N-grams are the basic features commonly used in sequence-based malicious code detection methods in computer virology research. The empirical results from previous works suggest that, while short length n-grams are easier to extract, the characteristics of the underlying executables are better represented in lengthier n-grams. However, by increasing the length of an n-gram, the feature space grows in an exponential manner and much space and computational resources are demanded. And therefore, feature selection has turned to be the most challenging step in establishing an accurate detection system based on byte n-grams. In this paper we propose an efficient feature extraction method where in order to gain more information; both adjacent and non-adjacent bigrams are used. Additionally, we present a novel boosting feature selection method based on genetic algorithm. Our experimental results indicate that the proposed detection system detects virus programs far more accurately than the best earlier known methods.
引用
收藏
页码:98 / 107
页数:10
相关论文
共 50 条
  • [1] A new N-gram feature extraction-selection method for malicious code
    School of Computer Engineering, Iran University of Science and Technology , Tehran, Iran
    [J]. Lect. Notes Comput. Sci., PART 2 (98-107):
  • [2] Boosting feature selection in a new non-adjacent N-gram for malicious code detection
    [J]. Parvin, Hamid (parvin@iust.ac.ir), 1600, CRL Publishing (22):
  • [3] New malicious code detection based on N-gram analysis and rough set theory
    Zhang, Boyun
    Yin, Jianping
    Hao, Jingbo
    Wang, Shulin
    Zhang, Dingxing
    Tang, Wensheng
    [J]. 2006 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY, PTS 1 AND 2, PROCEEDINGS, 2006, : 1229 - 1232
  • [4] New malicious code detection based on N-gram analysis and rough set theory
    Zhan, Boyun
    Yin, Jianping
    Hao, Jingbo
    Wang, Shulin
    Zhang, Dingxing
    [J]. COMPUTATIONAL INTELLIGENCE AND SECURITY, 2007, 4456 : 626 - 633
  • [5] N-gram feature selection for authorship identification
    Houvardas, John
    Stamatatos, Efstathios
    [J]. ARTIFICIAL INTELLIGENCE: METHODOLOGY, SYSTEMS, AND APPLICATIONS, PROCEEDINGS, 2006, 4183 : 77 - 86
  • [6] Apriori and N-gram Based Chinese Text Feature Extraction Method
    王晔
    黄上腾
    [J]. Journal of Shanghai Jiaotong University(Science), 2004, (04) : 11 - 14
  • [7] LANGUAGE IDENTIFICATION BASED ON N-GRAM FEATURE EXTRACTION METHOD BY USING CLASSIFIERS
    Bayrak Hayta, Sengul
    Takci, Hidayet
    Eminli, Mubariz
    [J]. ISTANBUL UNIVERSITY-JOURNAL OF ELECTRICAL AND ELECTRONICS ENGINEERING, 2013, 13 (02): : 1629 - 1638
  • [8] Partitioning Based N-Gram Feature Selection for Malware Classification
    Hu, Weiwei
    Tan, Ying
    [J]. DATA MINING AND BIG DATA, DMBD 2016, 2016, 9714 : 187 - 195
  • [9] An N-Gram Based Method for Bengali Keyphrase Extraction
    Sarkar, Kamal
    [J]. INFORMATION SYSTEMS FOR INDIAN LANGUAGES, 2011, 139 : 36 - 41
  • [10] A new type of feature - Loose N-gram feature in text categorization
    Zhang, Xian
    Zhu, Xiaoyan
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, PT 1, PROCEEDINGS, 2007, 4477 : 378 - +