A Comprehensive Empirical Analysis of Data Sets, Regression-Based Feature Selectors, and Linear SVM Classifiers for Intrusion Detection Systems

被引：2

作者：

Azimjonov, Jahongir ^{[1
]}

Kim, Taehong ^{[2
]}

机构：

[1] Andijan State Univ, Dept Informat Technol, Andijan 170100, Uzbekistan

[2] Chungbuk Natl Univ, Sch Informat & Commun Engn, Cheongju 28644, South Korea

来源：

IEEE INTERNET OF THINGS JOURNAL | 2024年 / 11卷 / 21期

基金：

新加坡国家研究基金会;

关键词：

Feature extraction; Accuracy; Support vector machines; Intrusion detection; Internet of Things; Classification algorithms; Surveys; Efficient and relevant features; enhancing Internet of Things (IoT) security; intrusion detection system (IDS) data sets; IDSs; linear classifiers [CSVMLK; linear support vector machine (LSVM); stochastic gradient descent classification (SGDC); regression-based feature selection;

D O I：

10.1109/JIOT.2024.3415499

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Machine learning (ML)-based intrusion detection systems (IDSs) are crucial in safeguarding computer networks against malicious activities. However, building an optimal (accurate and high-performance) ML-based IDS, a combination of data sets, feature selectors, and classifiers, is challenging. This article presents a comprehensive empirical analysis to enhance the effectiveness of IDSs by delving into these three critical components: 1) data sets; 2) feature selection; and 3) classification techniques based on regression models and linear support vector machines (LSVMs), respectively. We begin by evaluating six different data sets commonly used in IDS research, identifying their strengths, limitations, and suitability for real-world scenarios. Next, we explore regression-based feature selectors to identify the most relevant features for intrusion detection, enhancing the accuracy and efficiency of the IDSs. Then, we examine various LSVM classifiers, comparing their performance and highlighting their strengths and weaknesses. By combining these components, this study aims to provide a holistic understanding of the intricate relationship between data sets, regression-based feature selectors, and SVM-based linear classifiers, thus aiding researchers and practitioners in designing more effective and robust IDSs. The empirical analysis conducted in this study employs rigorous evaluation metrics and a comprehensive experimental setup to ensure reliable and unbiased results. The insights gained from our investigation can help guide future research and development efforts toward more efficient and reliable ML-based IDSs.

引用

页码：34676 / 34693

页数：18