Metadata-Based Detection of Child Sexual Abuse Material

被引:0
|
作者
Pereira, Mayana [1 ,2 ]
Dodhia, Rahul [3 ]
Anderson, Hyrum [4 ]
Brown, Richard [5 ]
机构
[1] Microsoft Corp, AI Good Res Lab, Redmond, WA 98052 USA
[2] Univ Brasilia, BR-70910900 Brasilia, Brazil
[3] Microsoft Corp, Redmond, WA 98052 USA
[4] Robust Intelligence, San Francisco, CA 94107 USA
[5] Project VIC Int, Neptune City, NJ 07753 USA
关键词
Adversarial examples; CSAM; deep learning; digital crimes; file paths; machine learning; metadata;
D O I
10.1109/TDSC.2023.3324275
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Child Sexual Abuse Media (CSAM) is any visual record of a sexually explicit activity involving minors. Machine learning-based solutions can help law enforcement identify CSAM and block distribution. Yet, collecting CSAM imagery to train machine learning models has ethical and legal constraints. CSAM detection systems based on file metadata offer several opportunities. Metadata is not a record of a crime and, therefore, clear of legal restrictions. This article proposes a CSAM detection framework consisting of machine learning models trained on file paths extracted from a real-world data set of over 1 million file paths obtained in criminal investigations. Our framework includes guidelines for model evaluation that account for data changes caused by adversarial data modification and variations in data distribution caused by limited access to training data, as well as an assessment of false positive rates against file paths from common crawl data. We achieve accuracies as high as 0.97 while presenting stable behavior under adversarial attacks previously used in natural language tasks. When evaluating the model on publicly available file paths from common crawl data, we observed a false positive rate of 0.002, showing that the model operating in distinct data distributions maintains low false positive rates.
引用
收藏
页码:3153 / 3164
页数:12
相关论文
共 50 条
  • [1] Automatic metadata-based development
    Padilla, G
    Páez, Y
    PROCEEDINGS OF THE FIFTH MEXICAN INTERNATIONAL CONFERENCE IN COMPUTER SCIENCE (ENC 2004), 2004, : 337 - 344
  • [2] Metadata-based data auditing
    Hinrichs, H
    Wilkens, T
    DATA MINING II, 2000, 2 : 141 - 150
  • [3] Detection of child sexual abuse
    Topley, J
    Thomas, A
    Hobbs, C
    Wynne, J
    AMERICAN JOURNAL OF OBSTETRICS AND GYNECOLOGY, 2001, 184 (05) : 1043 - 1044
  • [4] Child sexual abuse material on the darknet
    Gannon, Colm
    Blokland, Arjan A. J.
    Huikuri, Salla
    Babchishin, Kelly M.
    Lehmann, Robert J. B.
    FORENSISCHE PSYCHIATRIE PSYCHOLOGIE KRIMINOLOGIE, 2023, 17 (04) : 353 - 365
  • [5] Development of A Metadata-based Search Engine
    Gao, Yaqun
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTERNET TECHNOLOGY AND SECURITY (ITS 2010), 2010, : 73 - 77
  • [6] Metadata-based Management for Educational Resources
    Hu Wei
    Liu Shi
    Ye Haoyun
    Su Chuanjie
    PROCEEDINGS OF 2013 6TH INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT, INNOVATION MANAGEMENT AND INDUSTRIAL ENGINEERING (ICIII 2013) VOL 1, 2013, : 253 - 255
  • [7] Eliminating online child sexual abuse material
    Biswas, Ananda Kumar
    CHILD ABUSE REVIEW, 2024, 33 (05)
  • [8] Detection of child sexual abuse - Reply
    Berenson, AB
    Chacko, MR
    AMERICAN JOURNAL OF OBSTETRICS AND GYNECOLOGY, 2001, 184 (05) : 1044 - 1045
  • [9] Users of Online Child Sexual Abuse Material
    Huikuri, Salla
    JOURNAL OF POLICE AND CRIMINAL PSYCHOLOGY, 2023, 38 (04) : 904 - 913
  • [10] Users of Online Child Sexual Abuse Material
    Salla Huikuri
    Journal of Police and Criminal Psychology, 2023, 38 : 904 - 913