Malware Detection and Classification Based on n-grams Attribute Similarity

被引:27
|
作者
Zhang Fuyong [1 ]
Zhao Tiezhou [1 ]
机构
[1] Dongguan Univ Technol, Sch Comp Sci & Network Secur, Dongguan, Peoples R China
基金
中国国家自然科学基金;
关键词
malware detection; attribute similarity; machine learning; unknown malware; static analysis;
D O I
10.1109/CSE-EUC.2017.157
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Unknown malware has increased dramatically, but the existing security software cannot identify them effectively. In this paper, we propose a new malware detection and classification method based on n-grams attribute similarity. We extract all n-grams of byte codes from training samples and select the most relevant as attributes. After calculating the average value of attributes in malware and benign separately, we determine a test sample is malware or benign by attribute similarity between attributes of the test sample and the two average attributes of malware and benign. We compare our method with a variety of machine learning methods, including Naive Bayes, Bayesian Networks, Support Vector Machine and C4.5 Decision Tree. Experimental results on public (Open Malware Benchmark) and private (self-collected) datasets both reveal that our method outperforms the other four methods.
引用
收藏
页码:793 / 796
页数:4
相关论文
共 50 条
  • [31] A CNN based approach to Phrase-Labelling through classification of N-Grams
    Choudhary, Chinmay
    O'Riordan, Colm
    [J]. PROCEEDINGS OF THE 11TH ANNUAL MEETING OF THE FORUM FOR INFORMATION RETRIEVAL EVALUATION (FIRE 2019), 2019, : 18 - 23
  • [32] Feature Extension for Chinese Short Text Classification Based on Topical N-Grams
    Sun, Baoshan
    Zhao, Peng
    [J]. 2017 16TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS 2017), 2017, : 477 - 482
  • [33] Automatic statistical translation based on n-grams
    Oliver, Antonio
    Badia, Toni
    Boleda, Gemma
    Melero, Maite
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2005, (35): : 77 - 84
  • [34] Utilizing statistical characteristics of N-grams for intrusion detection
    Li, ZW
    Das, A
    Nandi, S
    [J]. 2003 INTERNATIONAL CONFERENCE ON CYBERWORLDS, PROCEEDINGS, 2003, : 486 - 493
  • [35] Interpolated N-Grams for Model Based Testing
    Tonella, Paolo
    Tiella, Roberto
    Cu Duy Nguyen
    [J]. 36TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2014), 2014, : 562 - 572
  • [36] Using Word N-Grams as Features in Arabic Text Classification
    Al-Thubaity, Abdulmohsen
    Alhoshan, Muneera
    Hazzaa, Itisam
    [J]. SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING, 2015, 569 : 35 - 43
  • [37] Sentence Classification Using N-Grams in Urdu Language Text
    Awan, Malik Daler Ali
    Ali, Sikandar
    Samad, Ali
    Iqbal, Nadeem
    Missen, Malik Muhammad Saad
    Ullah, Niamat
    [J]. SCIENTIFIC PROGRAMMING, 2021, 2021
  • [38] Human Action Classification Using N-Grams Visual Vocabulary
    Hernandez-Garcia, Ruber
    Garcia-Reyes, Edel
    Ramos-Cozar, Julian
    Guil, Nicolas
    [J]. PROGRESS IN PATTERN RECOGNITION IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2014, 2014, 8827 : 319 - 326
  • [39] GAUGING SIMILARITY WITH N-GRAMS - LANGUAGE-INDEPENDENT CATEGORIZATION OF TEXT
    DAMASHEK, M
    [J]. SCIENCE, 1995, 267 (5199) : 843 - 848
  • [40] Algorithm for Updating n-Grams Word Dictionary for Web Classification
    Abidin, Taufik Fuadi
    Ferdhiana, Ridha
    [J]. 2016 INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTING (ICIC), 2016, : 432 - 436