Metadata-Based Detection of Child Sexual Abuse Material

被引:0
|
作者
Pereira, Mayana [1 ,2 ]
Dodhia, Rahul [3 ]
Anderson, Hyrum [4 ]
Brown, Richard [5 ]
机构
[1] Microsoft Corp, AI Good Res Lab, Redmond, WA 98052 USA
[2] Univ Brasilia, BR-70910900 Brasilia, Brazil
[3] Microsoft Corp, Redmond, WA 98052 USA
[4] Robust Intelligence, San Francisco, CA 94107 USA
[5] Project VIC Int, Neptune City, NJ 07753 USA
关键词
Adversarial examples; CSAM; deep learning; digital crimes; file paths; machine learning; metadata;
D O I
10.1109/TDSC.2023.3324275
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Child Sexual Abuse Media (CSAM) is any visual record of a sexually explicit activity involving minors. Machine learning-based solutions can help law enforcement identify CSAM and block distribution. Yet, collecting CSAM imagery to train machine learning models has ethical and legal constraints. CSAM detection systems based on file metadata offer several opportunities. Metadata is not a record of a crime and, therefore, clear of legal restrictions. This article proposes a CSAM detection framework consisting of machine learning models trained on file paths extracted from a real-world data set of over 1 million file paths obtained in criminal investigations. Our framework includes guidelines for model evaluation that account for data changes caused by adversarial data modification and variations in data distribution caused by limited access to training data, as well as an assessment of false positive rates against file paths from common crawl data. We achieve accuracies as high as 0.97 while presenting stable behavior under adversarial attacks previously used in natural language tasks. When evaluating the model on publicly available file paths from common crawl data, we observed a false positive rate of 0.002, showing that the model operating in distinct data distributions maintains low false positive rates.
引用
收藏
页码:3153 / 3164
页数:12
相关论文
共 50 条
  • [21] Psychological Perspectives of Virtual Child Sexual Abuse Material
    Larissa S. Christensen
    Dominique Moritz
    Ashley Pearson
    Sexuality & Culture, 2021, 25 : 1353 - 1365
  • [22] Detecting child sexual abuse material: A comprehensive survey
    Lee, Hee-Eun
    Ermakova, Tatiana
    Ververis, Vasilis
    Fabian, Benjamin
    FORENSIC SCIENCE INTERNATIONAL-DIGITAL INVESTIGATION, 2020, 34
  • [23] The Challenges of Identifying and Classifying Child Sexual Abuse Material
    Kloess, Juliane A.
    Woodhams, Jessica
    Whittle, Helen
    Grant, Tim
    Hamilton-Giachritsis, Catherine E.
    SEXUAL ABUSE-A JOURNAL OF RESEARCH AND TREATMENT, 2019, 31 (02) : 173 - 196
  • [24] Psychological Perspectives of Virtual Child Sexual Abuse Material
    Christensen, Larissa S.
    Moritz, Dominique
    Pearson, Ashley
    SEXUALITY & CULTURE-AN INTERDISCIPLINARY JOURNAL, 2021, 25 (04): : 1353 - 1365
  • [25] Projective techniques and the detection of child sexual abuse
    Garb, HN
    Wood, JM
    Nezworski, MT
    CHILD ABUSE & NEGLECT, 2000, 24 (04) : 437 - 438
  • [26] Sexual posttraumatic stress among investigators of child sexual abuse material
    Gewirtz-Meydan, Ateret
    Mitchell, Kimberly J.
    O'Brien, Jennifer E.
    POLICING-A JOURNAL OF POLICY AND PRACTICE, 2023,
  • [27] Child sexual abuse - Cultural diversity and child sexual abuse
    Kitayama, A
    SEXUALITY AND HUMAN BONDING, 1996, 1095 : 241 - 244
  • [28] A metadata-based RCBR transmission of video-on-demand
    Song, H
    2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL III, PROCEEDINGS, 2002, : 181 - 184
  • [29] CHIME: A metadata-based distributed software development environment
    Dossick, SE
    Kaiser, GE
    SOFTWARE ENGINEERING - ESEC/FSE '99, PROCEEDINGS, 1999, 1687 : 464 - 475
  • [30] A Metadata-Based Architectural Model for Dynamically Resilient Systems
    Serugendo, Giovanna Di Marzo
    Fitzgerald, John
    Romanovsky, Alexander
    Guelfi, Nicolas
    APPLIED COMPUTING 2007, VOL 1 AND 2, 2007, : 566 - +