Metadata-Based Detection of Child Sexual Abuse Material

被引:0
|
作者
Pereira, Mayana [1 ,2 ]
Dodhia, Rahul [3 ]
Anderson, Hyrum [4 ]
Brown, Richard [5 ]
机构
[1] Microsoft Corp, AI Good Res Lab, Redmond, WA 98052 USA
[2] Univ Brasilia, BR-70910900 Brasilia, Brazil
[3] Microsoft Corp, Redmond, WA 98052 USA
[4] Robust Intelligence, San Francisco, CA 94107 USA
[5] Project VIC Int, Neptune City, NJ 07753 USA
关键词
Adversarial examples; CSAM; deep learning; digital crimes; file paths; machine learning; metadata;
D O I
10.1109/TDSC.2023.3324275
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Child Sexual Abuse Media (CSAM) is any visual record of a sexually explicit activity involving minors. Machine learning-based solutions can help law enforcement identify CSAM and block distribution. Yet, collecting CSAM imagery to train machine learning models has ethical and legal constraints. CSAM detection systems based on file metadata offer several opportunities. Metadata is not a record of a crime and, therefore, clear of legal restrictions. This article proposes a CSAM detection framework consisting of machine learning models trained on file paths extracted from a real-world data set of over 1 million file paths obtained in criminal investigations. Our framework includes guidelines for model evaluation that account for data changes caused by adversarial data modification and variations in data distribution caused by limited access to training data, as well as an assessment of false positive rates against file paths from common crawl data. We achieve accuracies as high as 0.97 while presenting stable behavior under adversarial attacks previously used in natural language tasks. When evaluating the model on publicly available file paths from common crawl data, we observed a false positive rate of 0.002, showing that the model operating in distinct data distributions maintains low false positive rates.
引用
收藏
页码:3153 / 3164
页数:12
相关论文
共 50 条
  • [31] Metadata-based adaptive assembling of video clips on the web
    Kaiser, Rene
    Umgeher, Martin
    Hausenblas, Michael
    SECOND INTERNATIONAL WORKSHOP ON SEMANTIC MEDIA ADAPTATION AND PERSONALIZATION, PROCEEDINGS, 2007, : 9 - +
  • [32] Metadata-based measurements transmission verified by a Merkle Tree
    Divan, Mario Jose
    Sanchez-Reynoso, Maria Laura
    KNOWLEDGE-BASED SYSTEMS, 2021, 219
  • [33] A metadata-based approach to personalized mobile resource sharing
    IEEE Computer Society; Information Processing Society of Japan (IPS-J) (Institute of Electrical and Electronics Engineers Inc.):
  • [34] Research on Metadata-based Multiclass Information Sharing Technology
    Li, Xiaotao
    Hu, Xiaohui
    Liu, Xi
    Lu, Weina
    2014 IEEE WORKSHOP ON ELECTRONICS, COMPUTER AND APPLICATIONS, 2014, : 404 - 407
  • [36] A Metadata-based Architecture for Identification and Discovery of Services in SOA
    Haendchen Filho, Aluizio
    do Prado, Hercules Antonio
    Ferneda, Edilson
    PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL 2 (ICEIS), 2016, : 298 - 305
  • [37] A Qualitative and Quantitative Analysis on Metadata-Based Frameworks Usage
    Guerra, Eduardo
    Fernandes, Clovis
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2013, PT II, 2013, 7972 : 375 - 390
  • [38] A metadata-based method for sharing multiply heterogeneous information
    Li, Xiaotao
    Hu, Xiaohui
    Lu, Weina
    Liu, Xi
    International Journal of Database Theory and Application, 2015, 8 (03): : 155 - 166
  • [39] A metadata-based approach to personalized mobile resource sharing
    Jian, Z
    Qing, L
    Xiang, L
    Liu, WY
    24TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS WORKSHOPS, PROCEEDINGS, 2004, : 568 - 573
  • [40] Unifying Metadata-Based Storage Reconstruction and Carving with LAYR
    Schneider, Janine
    Deifel, Hans-Peter
    Milius, Stefan
    Freiling, Felix
    FORENSIC SCIENCE INTERNATIONAL-DIGITAL INVESTIGATION, 2020, 33