Evaluating the Impact of Data Preprocessing Techniques on the Performance of Intrusion Detection Systems

被引:1
|
作者
Santos, Kelson Carvalho [1 ,2 ]
Miani, Rodrigo Sanches [2 ]
Silva, Flavio de Oliveira [2 ,3 ]
机构
[1] Fed Inst Piaui IFPI, Comp Dept, Ave Pedro Freitas,1020, BR-64018000 Teresina, PI, Brazil
[2] Fed Univ Uberlandia UFU, Fac Comp FACOM, Ave Joao Naves Avila,2121 Campus Santa Monica, BR-38400902 Uberlandia, MG, Brazil
[3] Univ Minho, Sch Engn, Dept Informat DI, R Univ, P-4710057 Braga, Portugal
关键词
Data preprocessing techniques; IDS; Intrusion detection system; Machine learning; ML-based IDS; Statistical test; TAXONOMY;
D O I
10.1007/s10922-024-09813-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The development of Intrusion Detection Systems using Machine Learning techniques (ML-based IDS) has emerged as an important research topic in the cybersecurity field. However, there is a noticeable absence of systematic studies to comprehend the usability of such systems in real-world applications. This paper analyzes the impact of data preprocessing techniques on the performance of ML-based IDS using two public datasets, UNSW-NB15 and CIC-IDS2017. Specifically, we evaluated the effects of data cleaning, encoding, and normalization techniques on the performance of binary and multiclass intrusion detection models. This work investigates the impact of data preprocessing techniques on the performance of ML-based IDS and how the performance of different ML-based IDS is affected by data preprocessing techniques. To this end, we implemented a machine learning pipeline to apply the data preprocessing techniques in different scenarios to answer such questions. The findings analyzed using the Friedman statistical test and Nemenyi post-hoc test revealed significant differences in groups of data preprocessing techniques and ML-based IDS, according to the evaluation metrics. However, these differences were not observed in multiclass scenarios for data preprocessing techniques. Additionally, ML-based IDS exhibited varying performances in binary and multiclass classifications. Therefore, our investigation presents insights into the efficacy of different data preprocessing techniques for building robust and accurate intrusion detection models.
引用
收藏
页数:54
相关论文
共 50 条
  • [1] Evaluating the Impact of Data Preprocessing Techniques on the Performance of Intrusion Detection Systems
    Kelson Carvalho Santos
    Rodrigo Sanches Miani
    Flávio de Oliveira Silva
    [J]. Journal of Network and Systems Management, 2024, 32
  • [2] Application and Performance Analysis of Data Preprocessing for Intrusion Detection System
    Jiang, Shuai
    Xu, Xiaolong
    [J]. SCIENCE OF CYBER SECURITY, SCISEC 2019, 2019, 11933 : 163 - 177
  • [3] Data Preprocessing for Network Intrusion Detection
    Li, Li
    Ye, Yuan
    [J]. INFORMATION TECHNOLOGY FOR MANUFACTURING SYSTEMS, PTS 1 AND 2, 2010, : 867 - 871
  • [4] Intrusion detection taxonomy and data preprocessing mechanisms
    Al-Utaibi, Khaled A.
    El-Alfy, El-Sayed M.
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 34 (03) : 1369 - 1383
  • [5] Data warehousing and data mining techniques for intrusion detection systems
    Anoop Singhal
    Sushil Jajodia
    [J]. Distributed and Parallel Databases, 2006, 20 : 149 - 166
  • [6] Data warehousing and data mining techniques for intrusion detection systems
    Singhal, Anoop
    Jajodia, Sushil
    [J]. DISTRIBUTED AND PARALLEL DATABASES, 2006, 20 (02) : 149 - 166
  • [7] Approaches and Data Processing Techniques for Intrusion Detection Systems
    Srinivasu, Pakkurthi
    Avadhani, P. S.
    Korimilli, Vishal
    Ravipati, Prudhvi
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2009, 9 (12): : 181 - 186
  • [8] Data mining for intrusion detection: Techniques, applications and systems
    Pei, H
    Upadhyaya, SJ
    Farooq, F
    Govindaraju, V
    [J]. 20TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2004, : 877 - 877
  • [9] Comparative study of ML models for IIoT intrusion detection: impact of data preprocessing and balancing
    Eid, Abdulrahman Mahmoud
    Soudan, Bassel
    Nasif, Ali Bou
    Injadat, Mohammadnoor
    [J]. NEURAL COMPUTING & APPLICATIONS, 2024, 36 (13): : 6955 - 6972
  • [10] Comparative study of ML models for IIoT intrusion detection: impact of data preprocessing and balancing
    Abdulrahman Mahmoud Eid
    Bassel Soudan
    Ali Bou Nassif
    MohammadNoor Injadat
    [J]. Neural Computing and Applications, 2024, 36 : 6955 - 6972