Leveraging n-gram neural embeddings to improve deep learning DGA detection

被引：5

作者：

Morbidoni, Christian ^{[1
]}

Spalazzi, Luca ^{[2
]}

Teti, Antonio ^{[1
]}

Cucchiarelli, Alessandro ^{[2
]}

机构：

[1] Univ G dAnnunzio, Pescara, Italy

[2] Univ Politecn Marche, Ancona, Italy

来源：

37TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING | 2022年

关键词：

Domain Generation Algorithms; LSTM; Neural Embeddings; Deep Learning; n-grams;

D O I：

10.1145/3477314.3507269

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Several families of malware are based on the need to establish a connection with a Command and Control (C&C) server. In addition, to avoid detection, these servers "hide" behind domain names that are periodically changed according to a specific Domain Generation Algorithm (DGA). Hence, the malware that has infected a particular host uses the same DGA to make DNS queries in order to establish a connection with the C&C server. The identification of "malicious" domain names used in DNS queries is therefore crucial for their detection. For this purpose, various machine learning techniques have been used, in particular, recently, deep learning techniques have proved especially effective. However, to get good results, these techniques require very large and labelled training datasets. Nevertheless, the construction of such datasets, decidedly with regard to the collection of malicious domain names, is a very difficult and nonscalable task. In this paper, therefore, we explore the possibility of exploiting unsupervised character n-gram embeddings to improve the performance of a Deep Learning DGA classifier. Embeddings are trained using a large dataset of benign names, opening up the possibility of using a small classifier training dataset requiring a small number of malicious names. A series of experiments, which use the same embedding for classifiers trained with datasets of increasing size, are then presented. These experiments show how the embedding is particularly effective for classifiers trained with small datasets having a small number of malicious names.

引用

页码：995 / 1004

页数：10

共 50 条

[1] Character n-Gram Embeddings to Improve RNN Language Models
Takase, Sho
Suzuki, Jun
Nagata, Masaaki
[J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 5074 - 5082
[2] Uniquely decodable n-gram embeddings
Kontorovich, L
[J]. THEORETICAL COMPUTER SCIENCE, 2004, 329 (1-3) : 271 - 284
[3] N-gram MalGAN:Evading machine learning detection via feature n-gram
Enmin Zhu
Jianjie Zhang
Jijie Yan
Kongyang Chen
Chongzhi Gao
[J]. Digital Communications and Networks, 2022, 8 (04) : 485 - 491
[4] N-gram MalGAN: Evading machine learning detection via feature n-gram
Zhu, Enmin
Zhang, Jianjie
Yan, Jijie
Chen, Kongyang
Gao, Chongzhi
[J]. DIGITAL COMMUNICATIONS AND NETWORKS, 2022, 8 (04) : 485 - 491
[5] EFFICIENT DEEP FEATURES LEARNING FOR VULNERABILITY DETECTION USING CHARACTER N-GRAM EMBEDDING
Alenezi, Mamdouh
Zagane, Mohammed
Javed, Yasir
[J]. JORDANIAN JOURNAL OF COMPUTERS AND INFORMATION TECHNOLOGY, 2021, 7 (01): : 25 - 38
[6] BetterWord Embeddings by Disentangling Contextual n-Gram Information
Gupta, Prakhar
Pagliardini, Matteo
Jaggi, Martin
[J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 933 - 939
[7] A Comparison of Word Embeddings and N-gram Models for DBpedia Type and Invalid Entity Detection
Zhou, Hanqing
Zouaq, Amal
Inkpen, Diana
[J]. INFORMATION, 2019, 10 (01)
[8] XSS Attack Detection With Machine Learning and n-Gram Methods
Habibi, Gulit
Surantha, Nico
[J]. PROCEEDINGS OF 2020 INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT AND TECHNOLOGY (ICIMTECH), 2020, : 516 - 520
[9] Combat Mobile Malware via N-gram Based Deep Learning
Dusun, Burak
Bulut, Irfan
Aygun, R. Can
Yavuz, A. Gokhan
[J]. 2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
[10] cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information
Cao, Shaosheng
Lu, Wei
Zhou, Jun
Li, Xiaolong
[J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 5053 - 5061

← 1 2 3 4 5 →