Classification of ransomware families with machine learning based on N-gram of opcodes

被引:140
|
作者
Zhang, Hanqi [1 ,2 ]
Xiao, Xi [2 ]
Mercaldo, Francesco [4 ]
Ni, Shiguang [3 ]
Martinelli, Fabio [5 ]
Sangaiah, Arun Kumar [6 ]
机构
[1] Cent China Normal Univ, Coll Phys Sci & Technol, Wuhan, Hubei, Peoples R China
[2] Tsinghua Univ, Grad Sch Shenzhen, Shenzhen, Peoples R China
[3] Tsinghua Univ, Grad Sch Shenzhen, Div Social Sci & Management, Shenzhen, Peoples R China
[4] Natl Res Council Italy, Inst Informat & Telemat, Pisa, Italy
[5] Natl Res Council Italy, Inst Informat & Telemat, Secur Grp, Pisa, Italy
[6] VIT Univ, Sch Comp Sci & Engn, Vellore, Tamil Nadu, India
关键词
Ransomware classification; Static analysis; Opcode; Machine learning; N-gram;
D O I
10.1016/j.future.2018.07.052
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Ransomware is a special type of malware that can lock victims' screen and/or encrypt their files to obtain ransoms, resulting in great damage to users. Mapping ransomware into families is useful for identifying the variants of a known ransomware sample and for reducing analysts' workload. However, ransomware that can fingerprint the environment can evade the precious work of dynamic analysis. To the best of our knowledge, to overcome this shortcoming, we are the first to propose an approach based on static analysis to classifying ransomware. First, opcode sequences from ransomware samples are transformed into N-gram sequences. Then, Term frequency-Inverse document frequency (TF-IDF) is calculated for each N-gram to select feature N-grams so that these N-grams exhibit better discrimination between families. Finally, we treat the vectors composed of the TF values of the feature N-grams as the feature vectors and subsequently feed them to five machine-learning methods to perform ransomware classification. Six evaluation criteria are employed to validate the model. Thorough experiments performed using real datasets demonstrate that our approach can achieve the best Accuracy of 91.43%. Furthermore, the average F1-measure of the "wannacry" ransomware family is up to 99%, and the Accuracy of binary classification is up to 99.3%. The proposed method can detect and classify ransomware that can fingerprint the environment. In addition, we discover that different feature dimensions are required for achieving similar classifier performance with feature N-grams of diverse lengths. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:211 / 221
页数:11
相关论文
共 50 条
  • [31] An n-gram based approach to the automatic classification of schoolchildren's writing
    Cicres, Jordi
    Queralt, Sheila
    [J]. VIAL-VIGO INTERNATIONAL JOURNAL OF APPLIED LINGUISTICS, 2019, 16 : 53 - 80
  • [32] Web Page Classification using n-gram based URL Features
    Rajalakshmi, R.
    Aravindan, Chandrabose
    [J]. 2013 FIFTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2013, : 15 - 21
  • [33] Ransomware Classification and Detection With Machine Learning Algorithms
    Masum, Mohammad
    Faruk, Md Jobair Hossain
    Shahriar, Hossain
    Qian, Kai
    Lo, Dan
    Adnan, Muhaiminul Islam
    [J]. 2022 IEEE 12TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2022, : 316 - 322
  • [34] Detection of Online Fake News Using N-Gram Analysis and Machine Learning Techniques
    Ahmed, Hadeer
    Traore, Issa
    Saad, Sherif
    [J]. INTELLIGENT, SECURE, AND DEPENDABLE SYSTEMS IN DISTRIBUTED AND CLOUD ENVIRONMENTS (ISDDC 2017), 2017, 10618 : 127 - 138
  • [35] Analysis of N-gram model on Telugu Document Classification
    Rani, B. Padmaja
    Vardhan, B. Vishnu
    Durga, A. Kanaka
    Reddy, L. Pratap
    Babu, A. Vinaya
    [J]. 2008 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-8, 2008, : 3199 - +
  • [36] An investigation of byte n-gram features for malware classification
    Raff, Edward
    Zak, Richard
    Cox, Russell
    Sylvester, Jared
    Yacci, Paul
    Ward, Rebecca
    Tracy, Anna
    McLean, Mark
    Nicholas, Charles
    [J]. JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2018, 14 (01): : 1 - 20
  • [37] Combat Mobile Malware via N-gram Based Deep Learning
    Dusun, Burak
    Bulut, Irfan
    Aygun, R. Can
    Yavuz, A. Gokhan
    [J]. 2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [38] Implementation of Machine Learning Algorithms in Arabic Sentiment Analysis Using N-Gram Features
    Gamal, Donia
    Alfonse, Marco
    El-Horbaty, El-Sayed M.
    Salem, Abdel-Badeeh M.
    [J]. PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE OF INFORMATION AND COMMUNICATION TECHNOLOGY [ICICT-2019], 2019, 154 : 332 - 340
  • [39] A Corpus Based N-gram Hybrid Approach of Bengali to English Machine Translation
    Rahman, Mohammad Masudur
    Kabir, Md Faisal
    Huda, Mohammad Nurul
    [J]. 2018 21ST INTERNATIONAL CONFERENCE OF COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2018,
  • [40] N-gram Based WSD for Improving Accuracy of Machine Translation using TM
    Rawat, Sunita
    Chandak, Manoj
    Khan, Tabassum
    [J]. HELIX, 2018, 8 (05): : 3916 - 3918