DNS dataset for malicious domains detection

被引:7
|
作者
Marques, Claudio [1 ]
Malta, Silvestre [2 ]
Magalhaes, Joao Paulo [3 ]
机构
[1] Politecn Viana Do Castelo, Escola Super Tecnol & Gestao, P-4900348 Viana Do Castelo, Portugal
[2] Politecn Viana Do Castelo, Escola Super Tecnol & Gestao, ADiT Lab, P-4900348 Viana Do Castelo, Portugal
[3] Politecn Porto, Escola Super Tecnol & Gestao, CIICESI, Felgueiras, Portugal
来源
DATA IN BRIEF | 2021年 / 38卷
关键词
DNS; Firewall; Machine learning; Cybersecurity;
D O I
10.1016/j.dib.2021.107342
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The Domain Name Service (DNS) is a central point in the functioning of the internet. Just as organizations use domain names to enable the access to their computational services, malicious actors make use of domain names to point to the services under their control. Distinguishing between non-malicious and malicious domain names is extremely important, as it allows to grant or block the access to external services, maximizing the security of the organization and users. Nowadays there are many DNS firewall solutions. Most of these are based on known malicious domain lists that are being constantly updated. However, in this way, it is only possible to block known malicious communications, leaving out many others that can be malicious but are not known. Adopting machine learning to classify domains contributes to the detection of domains that are not yet on the block list. The dataset described in this manuscript is meant for supervised machine learning-based analysis of malicious and non-malicious domain names. The dataset was created from scratch, using publicly DNS logs of both malicious and non malicious domain names. Using the domain name as input, 34 features were obtained. Features like the domain name entropy, number of strange characters and domain name length were obtained directly from the domain name. Other features like, domain name creation date, Internet Protocol (IP), open ports, geolocation were obtained from data enrichment processes (e.g. Open Source Intelligence (OSINT)). The class was determined considering the data source (malicious DNS log files and non-malicious DNS log files). The dataset consists of data from approximately 90 0 0 0 domain names and it is balanced between 50% non-malicious and 50% of malicious domain names. (C) 2021 Published by Elsevier Inc.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] DNS Traffic Analysis for Malicious Domains Detection
    Ghafir, Ibrahim
    Prenosil, Vaclav
    [J]. 2ND INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN) 2015, 2015, : 613 - 618
  • [2] Detection of Malicious Domains Using Passive DNS with XGBoost
    Silveira, Marcos Rogerio
    Cansian, Adriano Mauro
    Kobayashi, Hugo Koji
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS (ISI), 2020, : 59 - 61
  • [3] Detection of Newly Registered Malicious Domains through Passive DNS
    Silveira, Marcos Rogerio
    da Silva, Leandro Marcos
    Cansian, Adriano Mauro
    Kobayashi, Hugo Koji
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 3360 - 3369
  • [4] A Survey on Malicious Domains Detection through DNS Data Analysis
    Zhauniarovich, Yury
    Khalil, Issa
    Yu, Ting
    Dacier, Marc
    [J]. ACM COMPUTING SURVEYS, 2018, 51 (04)
  • [5] An Imbalanced Malicious Domains Detection Method Based on Passive DNS Traffic Analysis
    Liu, Zhenyan
    Zeng, Yifei
    Zhang, Pengfei
    Xue, Jingfeng
    Zhang, Ji
    Liu, Jiangtao
    [J]. SECURITY AND COMMUNICATION NETWORKS, 2018,
  • [6] Classifying Malicious Domains using DNS Traffic Analysis
    Mahdavifar, Samaneh
    Maleki, Nasim
    Lashkari, Arash Habibi
    Broda, Matt
    Razavi, Amir H.
    [J]. 2021 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS DASC/PICOM/CBDCOM/CYBERSCITECH 2021, 2021, : 60 - 67
  • [7] Comparison of DNS Based Methods for Detecting Malicious Domains
    Paz, Eyal
    Gudes, Ehud
    [J]. CYBER SECURITY CRYPTOGRAPHY AND MACHINE LEARNING (CSCML 2020), 2020, 12161 : 219 - 236
  • [8] Detecting Malicious Domains by Massive DNS Traffic Data Analysis
    Tian, Shiqi
    Fang, Cheng
    Liu, Jun
    Lei, Zhenming
    [J]. 2016 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS (IHMSC), VOL. 1, 2016, : 130 - 133
  • [9] XGBoost Applied to Identify Malicious Domains Using Passive DNS
    Silveira, Marcos Rogerio
    da Silva, Leandro Marcos
    Cansian, Adriano Mauro
    Kobayashi, Hugo Koji
    [J]. 2020 IEEE 19TH INTERNATIONAL SYMPOSIUM ON NETWORK COMPUTING AND APPLICATIONS (NCA), 2020,
  • [10] Detection of Malicious Payload Distribution Channels in DNS
    Kara, A. Mert
    Binsalleeh, Hamad
    Mannan, Mohammad
    Youssef, Amr
    Debbabi, Mourad
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2014, : 853 - 858