Improving ML Detection of IoT Botnets using Comprehensive Data and Feature Sets

被引:2
|
作者
Mehra, Misha [1 ]
Paranjape, Jay N. [1 ]
Ribeiro, Vinay J. [1 ]
机构
[1] Indian Inst Technol Delhi, Comp Sci & Engn, Delhi, India
关键词
IoT Botnet; IoT Security; Machine Learning; Malware Analysis; Sandboxing;
D O I
10.1109/COMSNETS51098.2021.9352943
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In recent times, the world has seen a tremendous increase in the number of attacks on IoT devices. A majority of these attacks have been botnet attacks, where an army of compromised IoT devices is used to launch DDoS attacks on targeted systems. In this paper, we study how the choice of a dataset and the extracted features determine the performance of a Machine Learning model, given the task of classifying Linux Binaries (ELFs) as being benign or malicious. Our work focuses on Linux systems since embedded Linux is the more popular choice for building today's IoT devices and systems. We propose using 4 different types of files as the dataset for any ML model. These include system files, IoT application files, IoT botnet files and general malware files. Further, we propose using static, dynamic as well as network features to do the classification task. We show that existing methods leave out one or the other features, or file types and hence, our model outperforms them in terms of accuracy in detecting these files. While enhancing the dataset adds to the robustness of a model, utilizing all 3 types of features decreases the false positive and false negative rates non-trivially. We employ an exhaustive scenario based method for evaluating a ML model and show the importance of including each of the proposed files in a dataset. We also analyze the features and try to explain their importance for a model, using observed trends in different benign and malicious files. We perform feature extraction using the open source Limon sandbox, which prior to this work has been tested only on Ubuntu 14. We installed and configured it for Ubuntu 18, the documentation of which has been shared on Github.
引用
收藏
页码:438 / 446
页数:9
相关论文
共 50 条
  • [21] Object Detection on Deformable Surfaces using Local Feature Sets
    Kaleli, Fatih
    Aydin, Nizamettin
    2017 IEEE INTERNATIONAL CONFERENCE ON POWER, CONTROL, SIGNALS AND INSTRUMENTATION ENGINEERING (ICPCSI), 2017, : 185 - 189
  • [22] A Comprehensive Empirical Analysis of Data Sets, Regression-Based Feature Selectors, and Linear SVM Classifiers for Intrusion Detection Systems
    Azimjonov, Jahongir
    Kim, Taehong
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (21): : 34676 - 34693
  • [23] Improving classification accuracy using data augmentation on small data sets
    Moreno-Barea, Francisco J.
    Jerez, Jose M.
    Franco, Leonardo
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 161 (161)
  • [24] Improving deep learning-based polyp detection using feature extraction and data augmentation
    Chou, Yung-Chien
    Chen, Chao-Chun
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (11) : 16817 - 16837
  • [25] Improving Detection of False Data Injection Attacks Using Machine Learning with Feature Selection and Oversampling
    Kumar, Ajit
    Saxena, Neetesh
    Jung, Souhwan
    Choi, Bong Jun
    ENERGIES, 2022, 15 (01)
  • [26] Improving deep learning-based polyp detection using feature extraction and data augmentation
    Yung-Chien Chou
    Chao-Chun Chen
    Multimedia Tools and Applications, 2023, 82 : 16817 - 16837
  • [27] Structural Analysis of the NSL-KDD Data Sets for Solving the Problem of Attacks Detection Using ML/DL Methods
    Krivchenkov, Aleksandr
    Misnevs, Boriss
    Grakovski, Alexander
    RELIABILITY AND STATISTICS IN TRANSPORTATION AND COMMUNICATION, RELSTAT2021, 2022, 410 : 3 - 13
  • [28] Robust object detection using fast feature selection from huge feature sets
    Le, Duy-Dinh
    Satoh, Shin'ichi
    2006 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP 2006, PROCEEDINGS, 2006, : 961 - +
  • [29] Improving Quality of Data: IoT Data Aggregation Using Device to Device Communications
    Sanyal, Sunny
    Zhang, Puning
    IEEE ACCESS, 2018, 6 : 67830 - 67840
  • [30] Feature elimination and stacking framework for accurate heart disease detection in IoT healthcare systems using clinical data
    Jian, Wang
    Li, Jian Ping
    Haq, Amin Ul
    Khan, Shakir
    Alotaibi, Reemiah Muneer
    Alajlan, Saad Abdullah
    Heyat, Md Belal Bin
    FRONTIERS IN MEDICINE, 2024, 11