Is It Overkill? Analyzing Feature-Space Concept Drift in Malware Detectors

被引:0
|
作者
Chen, Zhi [1 ]
Zhang, Zhenning [1 ]
Kan, Zeliang [2 ,3 ]
Yang, Limin [1 ]
Cortellazzi, Jacopo [2 ,3 ]
Pendlebury, Feargus [3 ]
Pierazzi, Fabio [2 ]
Cavallaro, Lorenzo [3 ]
Wang, Gang [1 ]
机构
[1] Univ Illinois, Urbana, IL 61081 USA
[2] Kings Coll London, London, England
[3] UCL, London, England
关键词
D O I
10.1109/SPW59333.2023.00007
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Concept drift is a major challenge faced by machine learning-based malware detectors when deployed in practice. While existing works have investigated methods to detect concept drift, it is not yet well understood regarding the main causes behind the drift. In this paper, we design experiments to empirically analyze the impact of feature-space drift (new features introduced by new samples) and compare it with data-space drift (data distribution shift over existing features). Surprisingly, we find that data-space drift is the dominating contributor to the model degradation over time while featurespace drift has little to no impact. This is consistently observed over both Android and PE malware detectors, with different feature types and feature engineering methods, across different settings. We further validate this observation with recent online learning based malware detectors that incrementally update the feature space. Our result indicates the possibility of handling concept drift without frequent feature updating, and we further discuss the open questions for future research.
引用
收藏
页码:21 / 28
页数:8
相关论文
共 50 条
  • [31] Feature-space SVM adaptation for speaker adapted word prominence detection
    Schnall, Andrea
    Heckmann, Martin
    COMPUTER SPEECH AND LANGUAGE, 2019, 53 : 198 - 216
  • [32] Refined Feature-Space Window Attention Vision Transformer for Image Classification
    Yoo D.
    Yoo J.
    Transactions of the Korean Institute of Electrical Engineers, 2024, 73 (06): : 1004 - 1011
  • [33] The Concept Drift Problem in Android Malware Detection and Its Solution
    Hu, Donghui
    Ma, Zhongjin
    Zhang, Xiaotian
    Li, Peipei
    Ye, Dengpan
    Ling, Baohong
    SECURITY AND COMMUNICATION NETWORKS, 2017,
  • [34] Transcending TRANSCEND: Revisiting Malware Classification in the Presence of Concept Drift
    Barbero, Federico
    Pendlebury, Feargus
    Pierazzi, Fabio
    Cavallaro, Lorenzo
    43RD IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP 2022), 2022, : 805 - 823
  • [35] A Feature-Space Indicator Kriging Approach for Remote Sensing Image Classification
    Chiang, Jie-Lun
    Liou, Jun-Jih
    Wei, Chiang
    Cheng, Ke-Sheng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2014, 52 (07): : 4046 - 4055
  • [36] Heterogeneous Feature Space for Android Malware Detection
    Varsha, M. V.
    Vinod, P.
    Dhanya, K. A.
    2015 EIGHTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3), 2015, : 383 - 388
  • [37] A NOVEL ESTIMATION OF FEATURE-SPACE MLLR FOR FULL-COVARIANCE MODELS
    Ghoshal, Arnab
    Povey, Daniel
    Agarwal, Mohit
    Akyazi, Pinar
    Burget, Lukas
    Feng, Kai
    Glembek, Ondrej
    Goel, Nagendra
    Karafiat, Martin
    Rastrow, Ariya
    Rose, Richard C.
    Schwarz, Petr
    Thomas, Samuel
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4310 - 4313
  • [38] Toward the development of a feature-space representation for a complex natural category domain
    Robert M. Nosofsky
    Craig A. Sanders
    Brian J. Meagher
    Bruce J. Douglas
    Behavior Research Methods, 2018, 50 : 530 - 556
  • [39] Streaming Malware Classification in the Presence of Concept Drift and Class Imbalance
    Kegelmeyer, W. Philip
    Chiang, Ken
    Ingram, Joe
    2013 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2013), VOL 2, 2013, : 48 - 53
  • [40] Transfer Learning across Feature-Rich Heterogeneous Feature Spaces via Feature-Space Remapping (FSR)
    Feuz, Kyle D.
    Cook, Diane J.
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2015, 6 (01)