Metadata-Based Detection of Child Sexual Abuse Material

被引:0
|
作者
Pereira, Mayana [1 ,2 ]
Dodhia, Rahul [3 ]
Anderson, Hyrum [4 ]
Brown, Richard [5 ]
机构
[1] Microsoft Corp, AI Good Res Lab, Redmond, WA 98052 USA
[2] Univ Brasilia, BR-70910900 Brasilia, Brazil
[3] Microsoft Corp, Redmond, WA 98052 USA
[4] Robust Intelligence, San Francisco, CA 94107 USA
[5] Project VIC Int, Neptune City, NJ 07753 USA
关键词
Adversarial examples; CSAM; deep learning; digital crimes; file paths; machine learning; metadata;
D O I
10.1109/TDSC.2023.3324275
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Child Sexual Abuse Media (CSAM) is any visual record of a sexually explicit activity involving minors. Machine learning-based solutions can help law enforcement identify CSAM and block distribution. Yet, collecting CSAM imagery to train machine learning models has ethical and legal constraints. CSAM detection systems based on file metadata offer several opportunities. Metadata is not a record of a crime and, therefore, clear of legal restrictions. This article proposes a CSAM detection framework consisting of machine learning models trained on file paths extracted from a real-world data set of over 1 million file paths obtained in criminal investigations. Our framework includes guidelines for model evaluation that account for data changes caused by adversarial data modification and variations in data distribution caused by limited access to training data, as well as an assessment of false positive rates against file paths from common crawl data. We achieve accuracies as high as 0.97 while presenting stable behavior under adversarial attacks previously used in natural language tasks. When evaluating the model on publicly available file paths from common crawl data, we observed a false positive rate of 0.002, showing that the model operating in distinct data distributions maintains low false positive rates.
引用
收藏
页码:3153 / 3164
页数:12
相关论文
共 50 条
  • [41] A Metadata-Based Approach for Unstructured Document Management in Organizations
    Paganelli, Federica
    Pettenati, Maria
    Giuli, Dino
    INFORMATION RESOURCES MANAGEMENT JOURNAL, 2006, 19 (01) : 1 - 22
  • [42] Beyond Learned Metadata-Based Raw Image Reconstruction
    Wang, Yufei
    Yu, Yi
    Yang, Wenhan
    Guo, Lanqing
    Chau, Lap-Pui
    Kot, Alex C.
    Wen, Bihan
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (12) : 5514 - 5533
  • [43] Anomaly Detection in Imbalanced Encrypted Traffic with Few Packet Metadata-Based Feature Extraction
    Kim, Min-Gyu
    Kim, Hwankuk
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2024, 141 (01): : 585 - 607
  • [44] Parental Production of Child Sexual Abuse Material: A Critical Review
    Salter, Michael
    Wong, Tim
    TRAUMA VIOLENCE & ABUSE, 2024, 25 (03) : 1826 - 1837
  • [45] Cyber strategies used to combat child sexual abuse material
    Christensen, Larissa S.
    Edwards, Graeme
    Rayment-McHugh, Susan
    Jones, Christian
    TRENDS AND ISSUES IN CRIME AND CRIMINAL JUSTICE, 2021, (636): : 1 - 16
  • [46] Rick Brown, Eliminating Online Child Sexual Abuse Material
    Habibi, Mahmud Nasrul
    Mastuti, Endah
    Andriani, Fitri
    JOURNAL OF CRIMINAL JUSTICE EDUCATION, 2024,
  • [47] Production and distribution of child sexual abuse material by parental figures
    Salter, Michael
    Wong, W. K. Tim
    Breckenridge, Jan
    Scott, Sue
    Cooper, Sharon
    Peleg, Noam
    TRENDS AND ISSUES IN CRIME AND CRIMINAL JUSTICE, 2021, (616): : 1 - 17
  • [48] Book review: Eliminating Online Child Sexual Abuse Material
    Wijaya, I. Made Marta
    Mulya, Muhammad Alyan
    Widiputranto, A. Prayudi
    Paulus, John Benyamin M.
    PROBATION JOURNAL, 2024, 71 (03) : 313 - 315
  • [49] Metadata-based Feature Aggregation Network for Face Recognition
    Sankaran, Nishant
    Tulyakov, Sergey
    Setlur, Srirangaraj
    Govindaraju, Venu
    2018 INTERNATIONAL CONFERENCE ON BIOMETRICS (ICB), 2018, : 118 - 123
  • [50] A metadata-based access control model for web services
    Yague, MI
    Maña, A
    Lopez, J
    INTERNET RESEARCH, 2005, 15 (01) : 99 - 116