A Novel Feature Encoding Scheme for Machine Learning Based Malware Detection Systems

被引:0
|
作者
Das, Vipin [1 ]
Nair, Binoy B. [2 ]
Thiruvengadathan, Rajagopalan [3 ]
机构
[1] Amrita Vishwa Vidyapeetham, Amrita Sch Artificial Intelligence, Coimbatore 641112, India
[2] Amrita Vishwa Vidyapeetham, Dept Elect & Commun Engn, Amrita Sch Engn, Coimbatore 641112, India
[3] Southern Utah Univ, Dept Engn & Technol, Cedar City, UT 84720 USA
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Malware; Codes; Encoding; Machine learning; Feature extraction; Grippers; Static analysis; Computer security; Intrusion detection; Classification algorithms; Detection algorithms; Cybersecurity; categorical encoding; intrusion detection; machine learning; malware classification; malware detection; NETWORK INTRUSION DETECTION; UNSW-NB15 DATA SET; IOT; ALGORITHM; MECHANISM; INTERNET; TAXONOMY; THINGS; SELECTION; DEFENSE;
D O I
10.1109/ACCESS.2024.3420080
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Malware detection is an ever-evolving area given that the strides in the detection capabilities being matched by radical attempts to bypass the detection. As the sophistication of malware continues to increase, the demand for innovative approaches to improve detection capabilities become paramount. Machine learning/Deep learning models are being increasingly used for Malware Detection, however one of the most important and frequently overlooked aspects of building such models is feature encoding. This research paper explores the importance of feature encoding to improve the efficiency of threat detection and proposes a novel entropy-based encoding scheme for the categorical features present in the data extracted from malicious inputs. The KDDCUP99, UNSW-NB15 and CIC-Evasive-PDFMal2022 datasets have been used to evaluate the effectiveness of the proposed encoding scheme. The results of the proposed encoding scheme are validated against seven other encoding schemes to ascertain the credibility and usability of the proposed scheme. The efficiency of the proposed system evaluated by applying different encoded versions of the datasets to train various machine learning models and determining the classification performance of the models on each dataset. The machine learning models trained with the proposed encoding scheme produced stable classification results and outperformed other encoding schemes when dimensionality reduction was applied on the data. The ensemble classifier trained using the proposed scheme was able to classify the data with an F1 score of 99.99% when the dimension-reduced entropy-encoded KDD Cup99 dataset was used to build the model. On the CIC-Evasive-PDFMal2022 dataset, the entropy encoding has exhibited a slightly improved classification parameters with the ensemble methods yielding a peak F1 score of 99.27%. We have also determined the feature importance values of the features present in the datasets to study the change in the contribution levels of the features when multiple categorical encoding schemes are applied upon the data.
引用
收藏
页码:91187 / 91216
页数:30
相关论文
共 50 条
  • [31] Malware Detection and Classification in Android Application Using Simhash-Based Feature Extraction and Machine Learning
    Al-Kahla, Wafaa
    Taqieddin, Eyad
    Shatnawi, Ahmed S.
    Al-Ouran, Rami
    IEEE ACCESS, 2024, 12 : 174255 - 174273
  • [32] Machine Learning Based Obfuscated Malware Detection in the Cloud Environment with Nature-Inspired Feature Selection
    Ghazi, Mohd. Rehan
    Raghava, N. S.
    2022 5TH INTERNATIONAL CONFERENCE ON MULTIMEDIA, SIGNAL PROCESSING AND COMMUNICATION TECHNOLOGIES (IMPACT), 2022,
  • [33] Minimized feature overhead malware detection machine learning model employing MRMR-based ranking
    Singh, Priyanka
    Borgohain, Samir Kumar
    Sharma, Lakhan Dev
    Kumar, Jayendra
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (17):
  • [34] Enhancing Machine Learning Based Malware Detection Model by Reinforcement Learning
    Wu, Cangshuai
    Shi, Jiangyong
    Yang, Yuexiang
    Li, Wenhua
    ICCNS 2018: PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON COMMUNICATION AND NETWORK SECURITY, 2018, : 74 - 78
  • [35] Application of Machine Learning in Malware Detection
    Van Quynh, Trinh
    Hien, Vu Thanh
    Nguyen, Vu Thanh
    Bao, Huynh Quoc
    FUTURE DATA AND SECURITY ENGINEERING. BIG DATA, SECURITY AND PRIVACY, SMART CITY AND INDUSTRY 4.0 APPLICATIONS, FDSE 2022, 2022, 1688 : 362 - 374
  • [36] IoT Malware Detection with Machine Learning
    Buttyan, Levente
    Ferenc, Rudolf
    ERCIM NEWS, 2022, (129): : 17 - 19
  • [37] Malware Detection Using Machine Learning
    Kumar, Ajay
    Abhishek, Kumar
    Shah, Kunjal
    Patel, Divy
    Jain, Yash
    Chheda, Harsh
    Nerurka, Pranav
    KNOWLEDGE GRAPHS AND SEMANTIC WEB, KGSWC 2020, 2020, 1232 : 61 - 71
  • [38] Applications of Machine Learning in Malware Detection
    Vaduva, Jan-Alexandru
    Pasca, Vlad-Raul
    Florea, Iulia-Maria
    Rughinis, Razvan
    NEW TECHNOLOGIES AND REDESIGNING LEARNING SPACES, VOL II, 2019, : 286 - 293
  • [39] Feature Engineering and Evaluation for Android Malware Detection Scheme
    Jung, Jaemin
    Park, Jihyeon
    Cho, Seong-je
    Han, Sangchul
    Park, Minkyu
    Cho, Hsin-Hung
    JOURNAL OF INTERNET TECHNOLOGY, 2021, 22 (02): : 423 - 440
  • [40] An Insight into the Machine-Learning-Based Fileless Malware Detection
    Khalid, Osama
    Ullah, Subhan
    Ahmad, Tahir
    Saeed, Saqib
    Alabbad, Dina A.
    Aslam, Mudassar
    Buriro, Attaullah
    Ahmad, Rizwan
    SENSORS, 2023, 23 (02)