Unveiling DoH tunnel: Toward generating a balanced DoH encrypted traffic dataset and profiling malicious behavior using inherently interpretable machine learning

被引：1

作者：

Niktabe, Sepideh ^{[1
]}

Lashkari, Arash Habibi ^{[2
]}

Roudsari, Arousha Haghighian ^{[3
]}

机构：

[1] York Univ, Dept EECS, Comp Sci, BCCC, DB2004 Victor Phillip Dahdaleh Bldg,4700 Keele St, Toronto, ON M3J 1P3, Canada

[2] York Univ, Behav Centr Cybersecur Ctr BCCC, Sch Informat Technol, Toronto, ON, Canada

[3] Gachon Univ, Sch Comp, Songnam, South Korea

来源：

PEER-TO-PEER NETWORKING AND APPLICATIONS | 2024年 / 17卷 / 01期

关键词：

Domain Name System (DNS); DNS over HTTPS (DoH); DNS tunnel; DoH tunnels; DoH encrypted traffic; Malicious profiling; Attack detection; Network security; FEATURE-SELECTION;

D O I：

10.1007/s12083-023-01597-4

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Encrypted domain name resolution can reduce the risk of privacy leakage for Internet users. However, it may also prevent network administrators from detecting suspicious communications. Profiling malicious and benign DNS-over-HTTPS (DoH) traffic can provide deeper insights into their behaviors, improving user activity identification and characterization. In this research, we proposed a new behavioral profiling model by selecting a method with high performance that is inherently interpretable. The inherently interpretable methods, including Linear Regression, Decision Trees (DT), and Random Forest (RF), were analyzed for understanding and providing more meaningful behavioral profiles. Based on the analysis, DT was selected to profile the malicious and benign behavior. To reduce the computational cost, improve the model performance and interpretability, and prevent overfitting issues, we introduced a novel feature engineering technique based on mutual information and the correlation coefficient between features to identify the best feature set for behavioral profiling. We also generated a public balanced dataset for analyzing the performance of the proposed profiling model, 'BCCC-CIRA-CIC-DoHBrw-2020'. This dataset is based on 'CIRA-CIC-DoHBrw-2020' which is a publicly available dataset. We utilized the SMOTE data balancing technique to generate the mentioned dataset. The experimental results showed an accuracy of 93.93% and 94.86% for the created malicious and benign profiles, respectively.

引用

页码：507 / 531

页数：25

共 4 条

[1] Unveiling DoH tunnel: Toward generating a balanced DoH encrypted traffic dataset and profiling malicious behavior using inherently interpretable machine learning
Sepideh Niktabe
Arash Habibi Lashkari
Arousha Haghighian Roudsari
Peer-to-Peer Networking and Applications, 2024, 17 : 507 - 531
[2] Identifying Malicious DNS Tunnel Tools from DoH Traffic Using Hierarchical Machine Learning Classification
Mitsuhashi, Rikima
Satoh, Akihiro
Jin, Yong
Iida, Katsuyoshi
Shinagawa, Takahiro
Takai, Yoshiaki
INFORMATION SECURITY (ISC 2021), 2021, 13118 : 238 - 256
[3] Unveiling malicious DNS behavior profiling and generating benchmark dataset through application layer traffic analysis
Shafi, Mohammadmoein
Lashkari, Arash Habibi
Mohanty, Hardhik
COMPUTERS & ELECTRICAL ENGINEERING, 2024, 118
[4] Unveiling evasive malware behavior: toward generating a multi-sources benchmark dataset and evasive malware behavior profiling using network traffic and memory analysisUnveiling evasive malware behavior: toward generating...A. H. Lashkari et al.
Arash Habibi Lashkari
MohammadMoein Shafi
Yongkun Li
Abhay Pratap Singh
Ashley Barkworth
The Journal of Supercomputing, 81 (6)

← 1 →