Leveraging n-gram neural embeddings to improve deep learning DGA detection

被引:5
|
作者
Morbidoni, Christian [1 ]
Spalazzi, Luca [2 ]
Teti, Antonio [1 ]
Cucchiarelli, Alessandro [2 ]
机构
[1] Univ G dAnnunzio, Pescara, Italy
[2] Univ Politecn Marche, Ancona, Italy
关键词
Domain Generation Algorithms; LSTM; Neural Embeddings; Deep Learning; n-grams;
D O I
10.1145/3477314.3507269
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Several families of malware are based on the need to establish a connection with a Command and Control (C&C) server. In addition, to avoid detection, these servers "hide" behind domain names that are periodically changed according to a specific Domain Generation Algorithm (DGA). Hence, the malware that has infected a particular host uses the same DGA to make DNS queries in order to establish a connection with the C&C server. The identification of "malicious" domain names used in DNS queries is therefore crucial for their detection. For this purpose, various machine learning techniques have been used, in particular, recently, deep learning techniques have proved especially effective. However, to get good results, these techniques require very large and labelled training datasets. Nevertheless, the construction of such datasets, decidedly with regard to the collection of malicious domain names, is a very difficult and nonscalable task. In this paper, therefore, we explore the possibility of exploiting unsupervised character n-gram embeddings to improve the performance of a Deep Learning DGA classifier. Embeddings are trained using a large dataset of benign names, opening up the possibility of using a small classifier training dataset requiring a small number of malicious names. A series of experiments, which use the same embedding for classifiers trained with datasets of increasing size, are then presented. These experiments show how the embedding is particularly effective for classifiers trained with small datasets having a small number of malicious names.
引用
收藏
页码:995 / 1004
页数:10
相关论文
共 50 条
  • [1] Character n-Gram Embeddings to Improve RNN Language Models
    Takase, Sho
    Suzuki, Jun
    Nagata, Masaaki
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 5074 - 5082
  • [2] Uniquely decodable n-gram embeddings
    Kontorovich, L
    [J]. THEORETICAL COMPUTER SCIENCE, 2004, 329 (1-3) : 271 - 284
  • [3] N-gram MalGAN:Evading machine learning detection via feature n-gram
    Enmin Zhu
    Jianjie Zhang
    Jijie Yan
    Kongyang Chen
    Chongzhi Gao
    [J]. Digital Communications and Networks, 2022, 8 (04) : 485 - 491
  • [4] N-gram MalGAN: Evading machine learning detection via feature n-gram
    Zhu, Enmin
    Zhang, Jianjie
    Yan, Jijie
    Chen, Kongyang
    Gao, Chongzhi
    [J]. DIGITAL COMMUNICATIONS AND NETWORKS, 2022, 8 (04) : 485 - 491
  • [5] EFFICIENT DEEP FEATURES LEARNING FOR VULNERABILITY DETECTION USING CHARACTER N-GRAM EMBEDDING
    Alenezi, Mamdouh
    Zagane, Mohammed
    Javed, Yasir
    [J]. JORDANIAN JOURNAL OF COMPUTERS AND INFORMATION TECHNOLOGY, 2021, 7 (01): : 25 - 38
  • [6] BetterWord Embeddings by Disentangling Contextual n-Gram Information
    Gupta, Prakhar
    Pagliardini, Matteo
    Jaggi, Martin
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 933 - 939
  • [7] A Comparison of Word Embeddings and N-gram Models for DBpedia Type and Invalid Entity Detection
    Zhou, Hanqing
    Zouaq, Amal
    Inkpen, Diana
    [J]. INFORMATION, 2019, 10 (01)
  • [8] XSS Attack Detection With Machine Learning and n-Gram Methods
    Habibi, Gulit
    Surantha, Nico
    [J]. PROCEEDINGS OF 2020 INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT AND TECHNOLOGY (ICIMTECH), 2020, : 516 - 520
  • [9] Combat Mobile Malware via N-gram Based Deep Learning
    Dusun, Burak
    Bulut, Irfan
    Aygun, R. Can
    Yavuz, A. Gokhan
    [J]. 2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [10] cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information
    Cao, Shaosheng
    Lu, Wei
    Zhou, Jun
    Li, Xiaolong
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 5053 - 5061