N-gram analysis for computer virus detection

被引:85
|
作者
Reddy, D. Krishna Sandeep [1 ]
Pujari, Arun K. [1 ]
机构
[1] Univ Hyderabad, Artificial Intelligence Lab, Hyderabad 500046, Andhra Pradesh, India
关键词
D O I
10.1007/s11416-006-0027-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Generic computer virus detection is the need of the hour as most commercial antivirus software fail to detect unknown and new viruses. Motivated by the success of datamining/machine learning techniques in intrusion detection systems, recent research in detecting malicious executables is directed towards devising efficient non-signature-based techniques that can profile the program characteristics from a set of training examples. Byte sequences and byte n-grams are considered to be basis of feature extraction. But as the number of n-grams is going to be very large, several methods of feature selections were proposed in literature. A recent report on use of information gain based feature selection has yielded the best-known result in classifying malicious executables from benign ones. We observe that information gain models the presence of n-gram in one class and its absence in the other. Through a simple example we show that this may lead to erroneous results. In this paper, we describe a new feature selection measure, class-wise document frequency of byte n-grams. We empirically demonstrate that the proposed method is a better method for feature selection. For detection, we combine several classifiers using Dempster Shafer Theory for better classification accuracy instead of using any single classifier. Our experimental results show that such a scheme detects virus program far more efficiently than the earlier known methods.
引用
收藏
页码:231 / 239
页数:9
相关论文
共 50 条
  • [1] Byte Level n-Gram Analysis for Malware Detection
    Jain, Sacbin
    Meena, Yogesb Kumar
    [J]. COMPUTER NETWORKS AND INTELLIGENT COMPUTING, 2011, 157 : 51 - 59
  • [2] HTTP attack detection using n-gram analysis
    Oza, Aditya
    Ross, Kevin
    Low, Richard M.
    Stamp, Mark
    [J]. COMPUTERS & SECURITY, 2014, 45 : 242 - 254
  • [3] N-gram MalGAN: Evading machine learning detection via feature n-gram
    Zhu, Enmin
    Zhang, Jianjie
    Yan, Jijie
    Chen, Kongyang
    Gao, Chongzhi
    [J]. DIGITAL COMMUNICATIONS AND NETWORKS, 2022, 8 (04) : 485 - 491
  • [4] N-gram MalGAN:Evading machine learning detection via feature n-gram
    Enmin Zhu
    Jianjie Zhang
    Jijie Yan
    Kongyang Chen
    Chongzhi Gao
    [J]. Digital Communications and Networks, 2022, 8 (04) - 491
  • [5] N-GRAM ANALYSIS FOR SLEEPING CELL DETECTION IN LTE NETWORKS
    Chernogorov, Fedor
    Ristaniemi, Tapani
    Brigatti, Kimmo
    Chernov, Sergey
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 4439 - 4443
  • [6] Association Analysis and N-Gram Based Detection of Incorrect Arguments
    Li, Chao
    Liu, Hui
    [J]. Ruan Jian Xue Bao/Journal of Software, 2018, 29 (08): : 2243 - 2257
  • [7] Using N-Gram Variations in Static Analysis for Malware Detection
    Radovancovici, Marco
    Galis, Darius
    Pungila, Ciprian
    [J]. 2022 24TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING, SYNASC, 2022, : 195 - 199
  • [8] N-gram Analysis of a Mongolian Text
    Altangerel, Khuder
    Tsend, Ganbat
    Jalsan, Khash-Erdene
    [J]. IFOST 2008: PROCEEDING OF THE THIRD INTERNATIONAL FORUM ON STRATEGIC TECHNOLOGIES, 2008, : 258 - 259
  • [9] N-GRAM ANALYSIS IN THE ENGINEERING DOMAIN
    Leary, Martin
    Pearson, Geoff
    Burvill, Colin
    Mazur, Maciej
    Subic, Aleksandar
    [J]. PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON ENGINEERING DESIGN (ICED 11): IMPACTING SOCIETY THROUGH ENGINEERING DESIGN, VOL 6: DESIGN INFORMATION AND KNOWLEDGE, 2011, 6 : 414 - 423
  • [10] N-gram Density based Malware Detection
    O'Kane, Philip
    Sezer, Sakir
    McLaughlin, Kieran
    [J]. 2014 WORLD SYMPOSIUM ON COMPUTER APPLICATIONS & RESEARCH (WSCAR), 2014,