Malware Detection and Classification Based on n-grams Attribute Similarity

被引:27
|
作者
Zhang Fuyong [1 ]
Zhao Tiezhou [1 ]
机构
[1] Dongguan Univ Technol, Sch Comp Sci & Network Secur, Dongguan, Peoples R China
基金
中国国家自然科学基金;
关键词
malware detection; attribute similarity; machine learning; unknown malware; static analysis;
D O I
10.1109/CSE-EUC.2017.157
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Unknown malware has increased dramatically, but the existing security software cannot identify them effectively. In this paper, we propose a new malware detection and classification method based on n-grams attribute similarity. We extract all n-grams of byte codes from training samples and select the most relevant as attributes. After calculating the average value of attributes in malware and benign separately, we determine a test sample is malware or benign by attribute similarity between attributes of the test sample and the two average attributes of malware and benign. We compare our method with a variety of machine learning methods, including Naive Bayes, Bayesian Networks, Support Vector Machine and C4.5 Decision Tree. Experimental results on public (Open Malware Benchmark) and private (self-collected) datasets both reveal that our method outperforms the other four methods.
引用
收藏
页码:793 / 796
页数:4
相关论文
共 50 条
  • [1] Classification of Malware Families Based on N-grams Sequential Pattern Features
    Liangboonprakong, Chatchai
    Sornil, Ohm
    [J]. PROCEEDINGS OF THE 2013 IEEE 8TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2013, : 777 - 782
  • [2] Embedded malware detection using Markov n-grams
    Shafiq, M. Zubair
    Khayam, Syed Ali
    Farooq, Muddassar
    [J]. DETECTION OF INTRUSIONS AND MALWARE, AND VULNERABILITY ASSESSMENT, 2008, 5137 : 88 - +
  • [3] What Can N-Grams Learn for Malware Detection?
    Zak, Richard
    Raff, Edward
    Nicholas, Charles
    [J]. PROCEEDINGS OF THE 2017 12TH INTERNATIONAL CONFERENCE ON MALICIOUS AND UNWANTED SOFTWARE (MALWARE), 2017, : 109 - 118
  • [4] New malware detection framework based on N-grams and SVDD with SMO
    El Boujnouni, Mohamed
    Jedra, Mohamed
    Zahid, Noureddine
    [J]. JOURNAL OF INFORMATION ASSURANCE AND SECURITY, 2016, 11 (04): : 223 - 232
  • [5] Hierarchical classification of Chinese documents based on N-grams
    Guan, JH
    Zhou, SG
    [J]. DIGITAL LIBRARIES: TECHNOLOGY AND MANAGEMENT OF INDIGENOUS KNOWLEDGE FOR GLOBAL ACCESS, 2003, 2911 : 643 - 652
  • [6] The N-Grams Based Text Similarity Detection Approach Using Self-Organizing Maps and Similarity Measures
    Stefanovic, Pavel
    Kurasova, Olga
    Strimaitis, Rokas
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (09):
  • [7] On Automatic Plagiarism Detection Based on n-Grams Comparison
    Barron-Cedeno, Alberto
    Rosso, Paolo
    [J]. ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2009, 5478 : 696 - 700
  • [8] N-grams Based Features for Indonesian Tweets Classification Problems
    Abidin, Taufik Fuadi
    Hasanuddin, Mauliana
    Mutiawani, Viska
    [J]. 2017 INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS (ICELTICS), 2017, : 307 - 310
  • [9] Pixel N-grams for mammographic lesion classification
    Kulkarni, Pradnya
    Stranieri, Andrew
    Ugon, Julien
    Mittal, Manish
    Kulkarni, Siddhivinayak
    [J]. 2017 2ND INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS, COMPUTING AND IT APPLICATIONS (CSCITA), 2017, : 107 - 111
  • [10] Anomaly Detection for Automotive Diagnostic Applications based on N-grams
    Rumez, Marcel
    Lin, Jinghua
    FuchB, Thomas
    Kriesten, Reiner
    Sax, Eric
    [J]. 2020 IEEE 44TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2020), 2020, : 1423 - 1429