Leveraging n-gram neural embeddings to improve deep learning DGA detection

被引:5
|
作者
Morbidoni, Christian [1 ]
Spalazzi, Luca [2 ]
Teti, Antonio [1 ]
Cucchiarelli, Alessandro [2 ]
机构
[1] Univ G dAnnunzio, Pescara, Italy
[2] Univ Politecn Marche, Ancona, Italy
关键词
Domain Generation Algorithms; LSTM; Neural Embeddings; Deep Learning; n-grams;
D O I
10.1145/3477314.3507269
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Several families of malware are based on the need to establish a connection with a Command and Control (C&C) server. In addition, to avoid detection, these servers "hide" behind domain names that are periodically changed according to a specific Domain Generation Algorithm (DGA). Hence, the malware that has infected a particular host uses the same DGA to make DNS queries in order to establish a connection with the C&C server. The identification of "malicious" domain names used in DNS queries is therefore crucial for their detection. For this purpose, various machine learning techniques have been used, in particular, recently, deep learning techniques have proved especially effective. However, to get good results, these techniques require very large and labelled training datasets. Nevertheless, the construction of such datasets, decidedly with regard to the collection of malicious domain names, is a very difficult and nonscalable task. In this paper, therefore, we explore the possibility of exploiting unsupervised character n-gram embeddings to improve the performance of a Deep Learning DGA classifier. Embeddings are trained using a large dataset of benign names, opening up the possibility of using a small classifier training dataset requiring a small number of malicious names. A series of experiments, which use the same embedding for classifiers trained with datasets of increasing size, are then presented. These experiments show how the embedding is particularly effective for classifiers trained with small datasets having a small number of malicious names.
引用
收藏
页码:995 / 1004
页数:10
相关论文
共 50 条
  • [21] PEPC: A Deep Parallel Convolutional Neural Network Model with Pre-trained Embeddings for DGA Detection
    Huang, Weiqing
    Zong, Yangyang
    Shi, Zhixin
    Wang, Leiqi
    Liu, Pengcheng
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [22] Deep Convolutional Neural Networks for DGA Detection
    Catania, Carlos
    Garcia, Sebastian
    Torres, Pablo
    [J]. COMPUTER SCIENCE - CACIC 2018, 2019, 995 : 327 - 340
  • [23] Syntax-ignorant N-gram embeddings for dialectal Arabic sentiment analysis
    Mulki, Hala
    Haddad, Hatem
    Gridach, Mourad
    Babaoglu, Ismail
    [J]. NATURAL LANGUAGE ENGINEERING, 2021, 27 (03) : 315 - 338
  • [24] Syntax-Ignorant N-gram Embeddings for Sentiment Analysis of Arabic Dialects
    Mulki, Hala
    Haddad, Hatem
    Gridach, Mourad
    Babaoglu, Ismail
    [J]. FOURTH ARABIC NATURAL LANGUAGE PROCESSING WORKSHOP (WANLP 2019), 2019, : 30 - 39
  • [25] A Comparative Analysis of N-Gram Deep Neural Network Approach to Classifying Human Perception on Dengvaxia
    Abrigo, Angelu Bianca C.
    Estuar, Ma Regina Justina E.
    [J]. 2019 IEEE 2ND INTERNATIONAL CONFERENCE ON INFORMATION AND COMPUTER TECHNOLOGIES (ICICT), 2019, : 46 - 51
  • [26] DGA Domain Detection using Deep Learning
    Shahzad, Haleh
    Sattar, Abdul Rahman
    Skandaraniyam, Janahan
    [J]. 2021 IEEE 5TH INTERNATIONAL CONFERENCE ON CRYPTOGRAPHY, SECURITY AND PRIVACY (ICCSP), 2021, : 139 - 143
  • [27] Bayesian learning of n-gram statistical language modeling
    Bai, Shuanhu
    Li, Haizhou
    [J]. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 1045 - 1048
  • [28] Arabic supervised learning method using N-gram
    Sanan, Majed
    Rammal, Mahmoud
    Zreik, Khaldoun
    [J]. INTERACTIVE TECHNOLOGY AND SMART EDUCATION, 2008, 5 (03) : 157 - +
  • [29] Detection of Online Fake News Using N-Gram Analysis and Machine Learning Techniques
    Ahmed, Hadeer
    Traore, Issa
    Saad, Sherif
    [J]. INTELLIGENT, SECURE, AND DEPENDABLE SYSTEMS IN DISTRIBUTED AND CLOUD ENVIRONMENTS (ISDDC 2017), 2017, 10618 : 127 - 138
  • [30] Session boundary detection for association rule learning using n-gram language models
    Huang, XJ
    Peng, FC
    An, AJ
    Schuurmans, D
    Cercone, N
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2003, 2671 : 237 - 251