A Novel Feature Encoding Scheme for Machine Learning Based Malware Detection Systems

被引:0
|
作者
Das, Vipin [1 ]
Nair, Binoy B. [2 ]
Thiruvengadathan, Rajagopalan [3 ]
机构
[1] Amrita Vishwa Vidyapeetham, Amrita Sch Artificial Intelligence, Coimbatore 641112, India
[2] Amrita Vishwa Vidyapeetham, Dept Elect & Commun Engn, Amrita Sch Engn, Coimbatore 641112, India
[3] Southern Utah Univ, Dept Engn & Technol, Cedar City, UT 84720 USA
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Malware; Codes; Encoding; Machine learning; Feature extraction; Grippers; Static analysis; Computer security; Intrusion detection; Classification algorithms; Detection algorithms; Cybersecurity; categorical encoding; intrusion detection; machine learning; malware classification; malware detection; NETWORK INTRUSION DETECTION; UNSW-NB15 DATA SET; IOT; ALGORITHM; MECHANISM; INTERNET; TAXONOMY; THINGS; SELECTION; DEFENSE;
D O I
10.1109/ACCESS.2024.3420080
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Malware detection is an ever-evolving area given that the strides in the detection capabilities being matched by radical attempts to bypass the detection. As the sophistication of malware continues to increase, the demand for innovative approaches to improve detection capabilities become paramount. Machine learning/Deep learning models are being increasingly used for Malware Detection, however one of the most important and frequently overlooked aspects of building such models is feature encoding. This research paper explores the importance of feature encoding to improve the efficiency of threat detection and proposes a novel entropy-based encoding scheme for the categorical features present in the data extracted from malicious inputs. The KDDCUP99, UNSW-NB15 and CIC-Evasive-PDFMal2022 datasets have been used to evaluate the effectiveness of the proposed encoding scheme. The results of the proposed encoding scheme are validated against seven other encoding schemes to ascertain the credibility and usability of the proposed scheme. The efficiency of the proposed system evaluated by applying different encoded versions of the datasets to train various machine learning models and determining the classification performance of the models on each dataset. The machine learning models trained with the proposed encoding scheme produced stable classification results and outperformed other encoding schemes when dimensionality reduction was applied on the data. The ensemble classifier trained using the proposed scheme was able to classify the data with an F1 score of 99.99% when the dimension-reduced entropy-encoded KDD Cup99 dataset was used to build the model. On the CIC-Evasive-PDFMal2022 dataset, the entropy encoding has exhibited a slightly improved classification parameters with the ensemble methods yielding a peak F1 score of 99.27%. We have also determined the feature importance values of the features present in the datasets to study the change in the contribution levels of the features when multiple categorical encoding schemes are applied upon the data.
引用
收藏
页码:91187 / 91216
页数:30
相关论文
共 50 条
  • [11] A Novel Malware Analysis for Malware Detection and Classification using Machine Learning Algorithms
    Sethi, Kamalakanta
    Chaudhary, Shankar Kumar
    Tripathy, Bata Krishan
    Bera, Padmalochan
    SIN'17: PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON SECURITY OF INFORMATION AND NETWORKS, 2017, : 107 - 113
  • [12] Malware Detection System Based on Machine Learning Methods for Android Operating Systems
    Utku, Anil
    Dogru, Ibrahim Alper
    2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,
  • [13] Android malware detection applying feature selection techniques and machine learning
    Mohammad Reza Keyvanpour
    Mehrnoush Barani Shirzad
    Farideh Heydarian
    Multimedia Tools and Applications, 2023, 82 : 9517 - 9531
  • [14] Evolutionary feature selection for machine learning based malware classification
    Kale, Gulsade
    Bostanci, Gazi Erkan
    Celebi, Fatih Vehbi
    ENGINEERING SCIENCE AND TECHNOLOGY-AN INTERNATIONAL JOURNAL-JESTECH, 2024, 56
  • [15] Android malware detection applying feature selection techniques and machine learning
    Keyvanpour, Mohammad Reza
    Shirzad, Mehrnoush Barani
    Heydarian, Farideh
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (06) : 9517 - 9531
  • [16] A Graph-Based Feature Generation Approach in Android Malware Detection with Machine Learning Techniques
    Liu, Xiaojian
    Lei, Qian
    Liu, Kehong
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020 (2020)
  • [17] Android Malware Detection Using Genetic Algorithm based Optimized Feature Selection and Machine Learning
    Fatima, Anam
    Maurya, Ritesh
    Dutta, Malay Kishore
    Burget, Radim
    Masek, Jan
    2019 42ND INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2019, : 220 - 223
  • [18] Understanding Update of Machine-Learning-Based Malware Detection by Clustering Changes in Feature Attributions
    Fan, Yun
    Shibahara, Toshiki
    Ohsita, Yuichi
    Chiba, Daiki
    Akiyama, Mitsuaki
    Murata, Masayuki
    ADVANCES IN INFORMATION AND COMPUTER SECURITY, IWSEC 2021, 2021, 12835 : 99 - 118
  • [19] Automated machine learning for deep learning based malware detection
    Brown, Austin
    Gupta, Maanak
    Abdelsalam, Mahmoud
    COMPUTERS & SECURITY, 2024, 137
  • [20] Machine Learning Based Improved Malware Detection Schemes
    Priyadarshan, Pradosh
    Sarangi, Prateek
    Ratht, Adyasha
    Rath, Adyasha
    Panda, Ganapati
    2021 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2021), 2021, : 925 - 931