Malware Detection Using Memory Analysis Data in Big Data Environment

被引:16
|
作者
Dener, Murat [1 ]
Ok, Gokce [1 ]
Orman, Abdullah [2 ]
机构
[1] Gazi Univ, Grad Sch Nat & Appl Sci, Informat Secur Engn, TR-06560 Ankara, Turkey
[2] Ankara Yildirim Beyazit Univ, Vocat Sch Tech Sci, Dept Comp Technol, TR-06760 Ankara, Turkey
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 17期
关键词
malware memory analysis; big data; machine learning; deep learning; Apache spark; classification; MACHINE LEARNING TECHNIQUES;
D O I
10.3390/app12178604
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Malware is a significant threat that has grown with the spread of technology. This makes detecting malware a critical issue. Static and dynamic methods are widely used in the detection of malware. However, traditional static and dynamic malware detection methods may fall short in advanced malware detection. Data obtained through memory analysis can provide important insights into the behavior and patterns of malware. This is because malwares leave various traces on memories. For this reason, the memory analysis method is one of the issues that should be studied in malware detection. In this study, the use of memory data in malware detection is suggested. Malware detection was carried out by using various deep learning and machine learning approaches in a big data environment with memory data. This study was carried out with Pyspark on Apache Spark big data platform in Google Colaboratory. Experiments were performed on the balanced CIC-MalMem-2022 dataset. Binary classification was made using Random Forest, Decision Tree, Gradient Boosted Tree, Logistic Regression, Naive Bayes, Linear Vector Support Machine, Multilayer Perceptron, Deep Feed Forward Neural Network, and Long Short-Term Memory algorithms. The performances of the algorithms used have been compared. The results were evaluated using the Accuracy, F1-score, Precision, Recall, and AUC performance metrics. As a result, the most successful malware detection was obtained with the Logistic Regression algorithm, with an accuracy level of 99.97% in malware detection by memory analysis. Gradient Boosted Tree follows the Logistic Regression algorithm with 99.94% accuracy. The Naive Bayes algorithm showed the lowest performance in malware analysis with memory data, with an accuracy of 98.41%. In addition, many of the algorithms used have achieved very successful results. According to the results obtained, the data obtained from memory analysis is very useful in detecting malware. In addition, deep learning and machine learning approaches were trained with memory datasets and achieved very successful results in malware detection.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Improving malware detection using big data and ensemble learning
    Gupta, Deepak
    Rani, Rinkle
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2020, 86
  • [2] A Hybrid System for Malware Detection on Big Data
    De Paola, Alessandra
    Gaglio, Salvatore
    Lo Re, Giuseppe
    Morana, Marco
    [J]. IEEE INFOCOM 2018 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2018, : 45 - 50
  • [3] Application of Big Data for Medical Data Analysis Using Hadoop Environment
    Roobini, M. S.
    Lakshmi, M.
    [J]. INTERNATIONAL CONFERENCE ON INTELLIGENT DATA COMMUNICATION TECHNOLOGIES AND INTERNET OF THINGS, ICICI 2018, 2019, 26 : 1128 - 1135
  • [4] Big Data Environment for Geospatial Data Analysis
    Praveen, P.
    Babu, Ch. Jayanth
    Rama, B.
    [J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON COMMUNICATION AND ELECTRONICS SYSTEMS (ICCES), 2016, : 573 - 578
  • [5] Clustering Analysis for Malware Behavior Detection using Registry Data
    Rosli, Nur Adibah
    Mohamed, Warusia
    Faizal, M. A.
    Selamat, Siti Rahayu
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (12) : 93 - 102
  • [6] Big Data Framework for Zero-Day Malware Detection
    Gupta, Deepak
    Rani, Rinkle
    [J]. CYBERNETICS AND SYSTEMS, 2018, 49 (02) : 103 - 121
  • [7] A New Approach to Malware Detection by Comparative Analysis of Data Structures in a Memory Image
    Aghaeikheirabady, Masoume
    Farshchi, Seyyed Mohammad Reza
    Shirazi, Hossein
    [J]. 2014 INTERNATIONAL CONGRESS ON TECHNOLOGY, COMMUNICATION AND KNOWLEDGE (ICTCK), 2014,
  • [8] Scalable malware detection system using big data and distributed machine learning approach
    Manish Kumar
    [J]. Soft Computing, 2022, 26 : 3987 - 4003
  • [9] Scalable malware detection system using big data and distributed machine learning approach
    Kumar, Manish
    [J]. SOFT COMPUTING, 2022, 26 (08) : 3987 - 4003
  • [10] Toward Cognitive Data Analysis with Big Data Environment
    Sik, David
    Csorba, Kristof
    Ekler, Peter
    [J]. 2018 9TH IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFOCOMMUNICATIONS (COGINFOCOM), 2018, : 23 - 28