Unveiling DoH tunnel: Toward generating a balanced DoH encrypted traffic dataset and profiling malicious behavior using inherently interpretable machine learning

被引:1
|
作者
Niktabe, Sepideh [1 ]
Lashkari, Arash Habibi [2 ]
Roudsari, Arousha Haghighian [3 ]
机构
[1] York Univ, Dept EECS, Comp Sci, BCCC, DB2004 Victor Phillip Dahdaleh Bldg,4700 Keele St, Toronto, ON M3J 1P3, Canada
[2] York Univ, Behav Centr Cybersecur Ctr BCCC, Sch Informat Technol, Toronto, ON, Canada
[3] Gachon Univ, Sch Comp, Songnam, South Korea
关键词
Domain Name System (DNS); DNS over HTTPS (DoH); DNS tunnel; DoH tunnels; DoH encrypted traffic; Malicious profiling; Attack detection; Network security; FEATURE-SELECTION;
D O I
10.1007/s12083-023-01597-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Encrypted domain name resolution can reduce the risk of privacy leakage for Internet users. However, it may also prevent network administrators from detecting suspicious communications. Profiling malicious and benign DNS-over-HTTPS (DoH) traffic can provide deeper insights into their behaviors, improving user activity identification and characterization. In this research, we proposed a new behavioral profiling model by selecting a method with high performance that is inherently interpretable. The inherently interpretable methods, including Linear Regression, Decision Trees (DT), and Random Forest (RF), were analyzed for understanding and providing more meaningful behavioral profiles. Based on the analysis, DT was selected to profile the malicious and benign behavior. To reduce the computational cost, improve the model performance and interpretability, and prevent overfitting issues, we introduced a novel feature engineering technique based on mutual information and the correlation coefficient between features to identify the best feature set for behavioral profiling. We also generated a public balanced dataset for analyzing the performance of the proposed profiling model, 'BCCC-CIRA-CIC-DoHBrw-2020'. This dataset is based on 'CIRA-CIC-DoHBrw-2020' which is a publicly available dataset. We utilized the SMOTE data balancing technique to generate the mentioned dataset. The experimental results showed an accuracy of 93.93% and 94.86% for the created malicious and benign profiles, respectively.
引用
收藏
页码:507 / 531
页数:25
相关论文
共 4 条