Feature Selection For Machine Learning-Based Early Detection of Distributed Cyber Attacks

被引:17
|
作者
Feng, Yaokai [1 ]
Akiyama, Hitoshi [2 ]
Lu, Liang [2 ,4 ]
Sakurai, Kouichi [3 ]
机构
[1] Kyushu Univ, Fac Adv Informat Technol, Fukuoka, Fukuoka, Japan
[2] Kyushu Univ, Dept Informat, Fukuoka, Fukuoka, Japan
[3] Kyushu Univ, Fac Informat, Fukuoka, Fukuoka, Japan
[4] Fujitsu Co Ltd, Fukuoka, Fukuoka, Japan
基金
日本科学技术振兴机构;
关键词
distributed cyber attacks; DDoS attacks; machine learning; feature selection; early detection; CLASSIFICATION;
D O I
10.1109/DASC/PiCom/DataCom/CyberSciTec.2018.00040
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is well known that distributed cyber attacks simultaneously launched from many hosts have caused the most serious problems in recent years including problems of privacy leakage and denial of services. Thus, how to detect those attacks at early stage has become an important and urgent topic in the cyber security community. For this purpose, recognizing C&C (Command & Control) communication between compromised bots and the C&C server becomes a crucially important issue, because C&C communication is in the preparation phase of distributed attacks. Although attack detection based on signature has been practically applied since long ago, it is well-known that it cannot efficiently deal with new kinds of attacks. In recent years, ML(Machine learning)-based detection methods have been studied widely. In those methods, feature selection is obviously very important to the detection performance. We once utilized up to 55 features to pick out C&C traffic in order to accomplish early detection of DDoS attacks. In this work, we try to answer the question that "Are all of those features really necessary?" We mainly investigate how the detection performance moves as the features are removed from those having lowest importance and we try to make it clear that what features should be payed attention for early detection of distributed attacks. We use honeypot data collected during the period from 2008 to 2013. SVM(Support Vector Machine) and PCA(Principal Component Analysis) are utilized for feature selection and SVM and RF(Random Forest) are for building the classifier. We find that the detection performance is generally getting better if more features are utilized. However, after the number of features has reached around 40, the detection performance will not change much even more features are used. It is also verified that, in some specific cases, more features do not always means a better detection performance. We also discuss 10 important features which have the biggest influence on classification.
引用
收藏
页码:173 / 180
页数:8
相关论文
共 50 条
  • [1] Machine Learning-Based Detection for Cyber Security Attacks on Connected and Autonomous Vehicles
    He, Qiyi
    Meng, Xiaolin
    Qu, Rong
    Xi, Ruijie
    [J]. MATHEMATICS, 2020, 8 (08)
  • [2] Machine learning-based intrusion detection: feature selection versus feature extraction
    Ngo, Vu-Duc
    Vuong, Tuan-Cuong
    Van Luong, Thien
    Tran, Hung
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (03): : 2365 - 2379
  • [3] Machine Learning-Based Feature Extraction and Selection
    Ruano-Ordas, David
    [J]. APPLIED SCIENCES-BASEL, 2024, 14 (15):
  • [4] Proposal of a Machine Learning-based Model to Optimize the Detection of Cyber-attacks in the Internet of Things
    Seyed, Cheikhane
    Ngo, Jeanne Roux Bilong
    Kebe, Mbaye
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (11) : 965 - 970
  • [5] Machine Learning-based Cyber Attacks Targeting on Controlled Information: A Survey
    Miao, Yuantian
    Chen, Chao
    Pan, Lei
    Han, Qing-Long
    Zhang, Jun
    Xiang, Yang
    [J]. ACM COMPUTING SURVEYS, 2021, 54 (07)
  • [6] Towards Effective Feature Selection in Machine Learning-Based Botnet Detection Approaches
    Beigi, Elaheh Biglar
    Jazi, Hossein Hadian
    Stakhanova, Natalia
    Ghorbani, Ali A.
    [J]. 2014 IEEE CONFERENCE ON COMMUNICATIONS AND NETWORK SECURITY (CNS), 2014, : 247 - 255
  • [7] Reviewing various feature selection techniques in machine learning-based botnet detection
    Baruah, Sangita
    Borah, Dhruba Jyoti
    Deka, Vaskar
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (12):
  • [8] Machine Learning-Based Cardiovascular Disease Detection Using Optimal Feature Selection
    Ullah, Tahseen
    Ullah, Syed Irfan
    Ullah, Khalil
    Ishaq, Muhammad
    Khan, Ahmad
    Ghadi, Yazeed Yasin
    Algarni, Abdulmohsen
    [J]. IEEE ACCESS, 2024, 12 : 16431 - 16446
  • [9] A Machine Learning-Based Wrapper Method for Feature Selection
    Patel, Damodar
    Saxena, Amit
    Wang, John
    [J]. INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2024, 20 (01)
  • [10] Phishing Attacks Detection A Machine Learning-Based Approach
    Salahdine, Fatima
    El Mrabet, Zakaria
    Kaabouch, Naima
    [J]. 2021 IEEE 12TH ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2021, : 250 - 255