A New N-gram Feature Extraction-Selection Method for Malicious Code

被引：0

作者：

Parvin, Hamid ^{[1
]}

Minaei, Behrouz ^{[1
]}

Karshenas, Hossein ^{[1
]}

Beigi, Akram ^{[1
]}

机构：

[1] Iran Univ Sci & Technol, Sch Comp Engn, Tehran, Iran

来源：

ADAPTIVE AND NATURAL COMPUTING ALGORITHMS, PT II | 2011年 / 6594卷

关键词：

Malicious Code; N-gram Analysis; Feature Selection;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

N-grams are the basic features commonly used in sequence-based malicious code detection methods in computer virology research. The empirical results from previous works suggest that, while short length n-grams are easier to extract, the characteristics of the underlying executables are better represented in lengthier n-grams. However, by increasing the length of an n-gram, the feature space grows in an exponential manner and much space and computational resources are demanded. And therefore, feature selection has turned to be the most challenging step in establishing an accurate detection system based on byte n-grams. In this paper we propose an efficient feature extraction method where in order to gain more information; both adjacent and non-adjacent bigrams are used. Additionally, we present a novel boosting feature selection method based on genetic algorithm. Our experimental results indicate that the proposed detection system detects virus programs far more accurately than the best earlier known methods.

引用

页码：98 / 107

页数：10

共 50 条

[1] A new N-gram feature extraction-selection method for malicious code
School of Computer Engineering, Iran University of Science and Technology , Tehran, Iran
[J]. Lect. Notes Comput. Sci., PART 2 (98-107):
[2] Boosting feature selection in a new non-adjacent N-gram for malicious code detection
[J]. Parvin, Hamid (parvin@iust.ac.ir), 1600, CRL Publishing (22):
[3] New malicious code detection based on N-gram analysis and rough set theory
Zhang, Boyun
Yin, Jianping
Hao, Jingbo
Wang, Shulin
Zhang, Dingxing
Tang, Wensheng
[J]. 2006 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY, PTS 1 AND 2, PROCEEDINGS, 2006, : 1229 - 1232
[4] New malicious code detection based on N-gram analysis and rough set theory
Zhan, Boyun
Yin, Jianping
Hao, Jingbo
Wang, Shulin
Zhang, Dingxing
[J]. COMPUTATIONAL INTELLIGENCE AND SECURITY, 2007, 4456 : 626 - 633
[5] N-gram feature selection for authorship identification
Houvardas, John
Stamatatos, Efstathios
[J]. ARTIFICIAL INTELLIGENCE: METHODOLOGY, SYSTEMS, AND APPLICATIONS, PROCEEDINGS, 2006, 4183 : 77 - 86
[6] Apriori and N-gram Based Chinese Text Feature Extraction Method
王晔
黄上腾
[J]. Journal of Shanghai Jiaotong University(Science), 2004, (04) : 11 - 14
[7] LANGUAGE IDENTIFICATION BASED ON N-GRAM FEATURE EXTRACTION METHOD BY USING CLASSIFIERS
Bayrak Hayta, Sengul
Takci, Hidayet
Eminli, Mubariz
[J]. ISTANBUL UNIVERSITY-JOURNAL OF ELECTRICAL AND ELECTRONICS ENGINEERING, 2013, 13 (02): : 1629 - 1638
[8] Partitioning Based N-Gram Feature Selection for Malware Classification
Hu, Weiwei
Tan, Ying
[J]. DATA MINING AND BIG DATA, DMBD 2016, 2016, 9714 : 187 - 195
[9] An N-Gram Based Method for Bengali Keyphrase Extraction
Sarkar, Kamal
[J]. INFORMATION SYSTEMS FOR INDIAN LANGUAGES, 2011, 139 : 36 - 41
[10] A new type of feature - Loose N-gram feature in text categorization
Zhang, Xian
Zhu, Xiaoyan
[J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, PT 1, PROCEEDINGS, 2007, 4477 : 378 - +

← 1 2 3 4 5 →