DNS dataset for malicious domains detection

被引：7

作者：

Marques, Claudio ^{[1
]}

Malta, Silvestre ^{[2
]}

Magalhaes, Joao Paulo ^{[3
]}

机构：

[1] Politecn Viana Do Castelo, Escola Super Tecnol & Gestao, P-4900348 Viana Do Castelo, Portugal

[2] Politecn Viana Do Castelo, Escola Super Tecnol & Gestao, ADiT Lab, P-4900348 Viana Do Castelo, Portugal

[3] Politecn Porto, Escola Super Tecnol & Gestao, CIICESI, Felgueiras, Portugal

来源：

DATA IN BRIEF | 2021年 / 38卷

关键词：

DNS; Firewall; Machine learning; Cybersecurity;

D O I：

10.1016/j.dib.2021.107342

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

The Domain Name Service (DNS) is a central point in the functioning of the internet. Just as organizations use domain names to enable the access to their computational services, malicious actors make use of domain names to point to the services under their control. Distinguishing between non-malicious and malicious domain names is extremely important, as it allows to grant or block the access to external services, maximizing the security of the organization and users. Nowadays there are many DNS firewall solutions. Most of these are based on known malicious domain lists that are being constantly updated. However, in this way, it is only possible to block known malicious communications, leaving out many others that can be malicious but are not known. Adopting machine learning to classify domains contributes to the detection of domains that are not yet on the block list. The dataset described in this manuscript is meant for supervised machine learning-based analysis of malicious and non-malicious domain names. The dataset was created from scratch, using publicly DNS logs of both malicious and non malicious domain names. Using the domain name as input, 34 features were obtained. Features like the domain name entropy, number of strange characters and domain name length were obtained directly from the domain name. Other features like, domain name creation date, Internet Protocol (IP), open ports, geolocation were obtained from data enrichment processes (e.g. Open Source Intelligence (OSINT)). The class was determined considering the data source (malicious DNS log files and non-malicious DNS log files). The dataset consists of data from approximately 90 0 0 0 domain names and it is balanced between 50% non-malicious and 50% of malicious domain names. (C) 2021 Published by Elsevier Inc.

引用

页数：13

共 50 条

[1] DNS Traffic Analysis for Malicious Domains Detection
Ghafir, Ibrahim
Prenosil, Vaclav
[J]. 2ND INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN) 2015, 2015, : 613 - 618
[2] Detection of Malicious Domains Using Passive DNS with XGBoost
Silveira, Marcos Rogerio
Cansian, Adriano Mauro
Kobayashi, Hugo Koji
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS (ISI), 2020, : 59 - 61
[3] Detection of Newly Registered Malicious Domains through Passive DNS
Silveira, Marcos Rogerio
da Silva, Leandro Marcos
Cansian, Adriano Mauro
Kobayashi, Hugo Koji
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 3360 - 3369
[4] A Survey on Malicious Domains Detection through DNS Data Analysis
Zhauniarovich, Yury
Khalil, Issa
Yu, Ting
Dacier, Marc
[J]. ACM COMPUTING SURVEYS, 2018, 51 (04)
[5] An Imbalanced Malicious Domains Detection Method Based on Passive DNS Traffic Analysis
Liu, Zhenyan
Zeng, Yifei
Zhang, Pengfei
Xue, Jingfeng
Zhang, Ji
Liu, Jiangtao
[J]. SECURITY AND COMMUNICATION NETWORKS, 2018,
[6] Classifying Malicious Domains using DNS Traffic Analysis
Mahdavifar, Samaneh
Maleki, Nasim
Lashkari, Arash Habibi
Broda, Matt
Razavi, Amir H.
[J]. 2021 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS DASC/PICOM/CBDCOM/CYBERSCITECH 2021, 2021, : 60 - 67
[7] Comparison of DNS Based Methods for Detecting Malicious Domains
Paz, Eyal
Gudes, Ehud
[J]. CYBER SECURITY CRYPTOGRAPHY AND MACHINE LEARNING (CSCML 2020), 2020, 12161 : 219 - 236
[8] Detecting Malicious Domains by Massive DNS Traffic Data Analysis
Tian, Shiqi
Fang, Cheng
Liu, Jun
Lei, Zhenming
[J]. 2016 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS (IHMSC), VOL. 1, 2016, : 130 - 133
[9] XGBoost Applied to Identify Malicious Domains Using Passive DNS
Silveira, Marcos Rogerio
da Silva, Leandro Marcos
Cansian, Adriano Mauro
Kobayashi, Hugo Koji
[J]. 2020 IEEE 19TH INTERNATIONAL SYMPOSIUM ON NETWORK COMPUTING AND APPLICATIONS (NCA), 2020,
[10] Detection of Malicious Payload Distribution Channels in DNS
Kara, A. Mert
Binsalleeh, Hamad
Mannan, Mohammad
Youssef, Amr
Debbabi, Mourad
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2014, : 853 - 858

← 1 2 3 4 5 →