Leveraging n-gram neural embeddings to improve deep learning DGA detection

被引：5

作者：

Morbidoni, Christian ^{[1
]}

Spalazzi, Luca ^{[2
]}

Teti, Antonio ^{[1
]}

Cucchiarelli, Alessandro ^{[2
]}

机构：

[1] Univ G dAnnunzio, Pescara, Italy

[2] Univ Politecn Marche, Ancona, Italy

来源：

37TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING | 2022年

关键词：

Domain Generation Algorithms; LSTM; Neural Embeddings; Deep Learning; n-grams;

D O I：

10.1145/3477314.3507269

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Several families of malware are based on the need to establish a connection with a Command and Control (C&C) server. In addition, to avoid detection, these servers "hide" behind domain names that are periodically changed according to a specific Domain Generation Algorithm (DGA). Hence, the malware that has infected a particular host uses the same DGA to make DNS queries in order to establish a connection with the C&C server. The identification of "malicious" domain names used in DNS queries is therefore crucial for their detection. For this purpose, various machine learning techniques have been used, in particular, recently, deep learning techniques have proved especially effective. However, to get good results, these techniques require very large and labelled training datasets. Nevertheless, the construction of such datasets, decidedly with regard to the collection of malicious domain names, is a very difficult and nonscalable task. In this paper, therefore, we explore the possibility of exploiting unsupervised character n-gram embeddings to improve the performance of a Deep Learning DGA classifier. Embeddings are trained using a large dataset of benign names, opening up the possibility of using a small classifier training dataset requiring a small number of malicious names. A series of experiments, which use the same embedding for classifiers trained with datasets of increasing size, are then presented. These experiments show how the embedding is particularly effective for classifiers trained with small datasets having a small number of malicious names.

引用

页码：995 / 1004

页数：10

共 50 条

[21] PEPC: A Deep Parallel Convolutional Neural Network Model with Pre-trained Embeddings for DGA Detection
Huang, Weiqing
Zong, Yangyang
Shi, Zhixin
Wang, Leiqi
Liu, Pengcheng
[J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[22] Deep Convolutional Neural Networks for DGA Detection
Catania, Carlos
Garcia, Sebastian
Torres, Pablo
[J]. COMPUTER SCIENCE - CACIC 2018, 2019, 995 : 327 - 340
[23] Syntax-ignorant N-gram embeddings for dialectal Arabic sentiment analysis
Mulki, Hala
Haddad, Hatem
Gridach, Mourad
Babaoglu, Ismail
[J]. NATURAL LANGUAGE ENGINEERING, 2021, 27 (03) : 315 - 338
[24] Syntax-Ignorant N-gram Embeddings for Sentiment Analysis of Arabic Dialects
Mulki, Hala
Haddad, Hatem
Gridach, Mourad
Babaoglu, Ismail
[J]. FOURTH ARABIC NATURAL LANGUAGE PROCESSING WORKSHOP (WANLP 2019), 2019, : 30 - 39
[25] A Comparative Analysis of N-Gram Deep Neural Network Approach to Classifying Human Perception on Dengvaxia
Abrigo, Angelu Bianca C.
Estuar, Ma Regina Justina E.
[J]. 2019 IEEE 2ND INTERNATIONAL CONFERENCE ON INFORMATION AND COMPUTER TECHNOLOGIES (ICICT), 2019, : 46 - 51
[26] DGA Domain Detection using Deep Learning
Shahzad, Haleh
Sattar, Abdul Rahman
Skandaraniyam, Janahan
[J]. 2021 IEEE 5TH INTERNATIONAL CONFERENCE ON CRYPTOGRAPHY, SECURITY AND PRIVACY (ICCSP), 2021, : 139 - 143
[27] Bayesian learning of n-gram statistical language modeling
Bai, Shuanhu
Li, Haizhou
[J]. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 1045 - 1048
[28] Arabic supervised learning method using N-gram
Sanan, Majed
Rammal, Mahmoud
Zreik, Khaldoun
[J]. INTERACTIVE TECHNOLOGY AND SMART EDUCATION, 2008, 5 (03) : 157 - +
[29] Detection of Online Fake News Using N-Gram Analysis and Machine Learning Techniques
Ahmed, Hadeer
Traore, Issa
Saad, Sherif
[J]. INTELLIGENT, SECURE, AND DEPENDABLE SYSTEMS IN DISTRIBUTED AND CLOUD ENVIRONMENTS (ISDDC 2017), 2017, 10618 : 127 - 138
[30] Session boundary detection for association rule learning using n-gram language models
Huang, XJ
Peng, FC
An, AJ
Schuurmans, D
Cercone, N
[J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2003, 2671 : 237 - 251

← 1 2 3 4 5 →