Detection of malicious java']javascript on an imbalanced dataset

被引:13
|
作者
Phung, Ngoc Minh [1 ]
Mimura, Mamoru [1 ]
机构
[1] Natl Def Acad, 1-10-20 Hashirimizu, Yokosuka, Kanagawa, Japan
关键词
Malicious [!text type='Java']Java[!/text]Script; Attention mechanism; Natural language processing; Oversampling; Machine learning;
D O I
10.1016/j.iot.2021.100357
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In order to be able to detect new malicious JavaScript with low cost, methods with machine learning techniques have been proposed and gave positive results. These methods focus on achieving a light-weight filtering model that can quickly and precisely filter out malicious data for dynamic analysis. A method constructs a language model using Natural Language Processing techniques to represent the data in vector form from the source code for machine learning. This method has high score with the balanced dataset, however the experiment with an imbalanced dataset has not been done. Previous studies mainly focus on a balanced dataset, however the dataset is not representative of real-world data, and it rises questions in practical uses of the model. A good model that can have a high recall score with imbalanced dataset is needed for a good filter. To construct an efficient language model, and to deal with the data imbalance problem, we focus on oversampling techniques. In our research, our method is the first to use oversampling and machine learning to detect malicious JavaScript. The experimental result shows that our method can detect new malicious JavaScript more accurately and efficiently. Our model can quickly filter out malicious data for dynamic analysis. The best recall score achieves 0.72 with the Doc2Vec model. Our proposed method is shown to outperform the baseline method by 210% in terms of recal score with the same training time and test time per sample. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] A Machine Learning Approach to Malicious Java']JavaScript Detection using Fixed Length Vector Representation
    Ndichu, Samuel
    Ozawa, Seiichi
    Misu, Takeshi
    Okada, Kouichirou
    [J]. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [32] A deep learning approach for detecting malicious Java']JavaScript code
    Wang, Yao
    Cai, Wan-dong
    Wei, Peng-cheng
    [J]. SECURITY AND COMMUNICATION NETWORKS, 2016, 9 (11) : 1520 - 1534
  • [33] Malicious Java']JavaScript Insertion through ARP Poisoning Attacks
    Zdrnja, Bojan
    [J]. IEEE SECURITY & PRIVACY, 2009, 7 (03) : 72 - 74
  • [34] Detecting malicious Java']JavaScript code based on semantic analysis
    Fang, Yong
    Huang, Cheng
    Su, Yu
    Qiu, Yaoyao
    [J]. COMPUTERS & SECURITY, 2020, 93
  • [35] Detecting Malicious Java']Javascript in PDF through Document Instrumentation
    Liu, Daiping
    Wang, Haining
    Stavrou, Angelos
    [J]. 2014 44TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN), 2014, : 100 - 111
  • [36] Early detection of malicious behavior in javascript code
    Schütt, Kristof
    Kloft, Marius
    Bikadorov, Alexander
    Rieck, Konrad
    [J]. Proceedings of the ACM Conference on Computer and Communications Security, 2012, : 15 - 24
  • [37] JS']JSContana: Malicious Java']JavaScript detection using adaptable context analysis and key feature extraction
    Huang, Yunhua
    Li, Tao
    Zhang, Lijia
    Li, Beibei
    Liu, Xiaojie
    [J]. COMPUTERS & SECURITY, 2021, 104
  • [38] JS']JStrong: Malicious Java']JavaScript detection based on code semantic representation and graph neural network
    Fang, Yong
    Huang, Chaoyi
    Zeng, Minchuan
    Zhao, Zhiying
    Huang, Cheng
    [J]. COMPUTERS & SECURITY, 2022, 118
  • [39] Deobfuscation, unpacking, and decoding of obfuscated malicious Java']JavaScript for machine learning models detection performance improvement
    Ndichu, Samuel
    Kim, Sangwook
    Ozawa, Seiichi
    [J]. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2020, 5 (03) : 184 - 192
  • [40] Java']JavaScript Malicious Codes Analysis Based on Naive Bayes Classification
    Hao, Yongle
    Liang, Hongliang
    Zhang, Daijie
    Zhao, Qian
    Cui, Baojiang
    [J]. 2014 NINTH INTERNATIONAL CONFERENCE ON P2P, PARALLEL, GRID, CLOUD AND INTERNET COMPUTING (3PGCIC), 2014, : 513 - 519