PIILO: an open-source system for personally identifiable information labeling and obfuscation

被引:1
|
作者
Holmes, Langdon [1 ]
Crossley, Scott [2 ]
Sikka, Harshvardhan [3 ]
Morris, Wesley [1 ]
机构
[1] Vanderbilt Univ, Dept Psychol & Human Dev, Nashville, TN 37235 USA
[2] Vanderbilt Univ, Dept Special Educ, Nashville, TN USA
[3] Georgia Tech, Dept Comp Sci, Atlanta, GA USA
基金
美国国家科学基金会;
关键词
Privacy; Deidentification; Anonymization; Student data; Hiding in plain sight; Transformer; LEARNING ANALYTICS; PRIVACY;
D O I
10.1108/ILS-04-2023-0032
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
PurposeThis study aims to report on an automatic deidentification system for labeling and obfuscating personally identifiable information (PII) in student-generated text.Design/methodology/approachThe authors evaluate the performance of their deidentification system on two data sets of student-generated text. Each data set was human-annotated for PII. The authors evaluate using two approaches: per-token PII classification accuracy and a simulated reidentification attack design. In the reidentification attack, two reviewers attempted to recover student identities from the data after PII was obfuscated by the authors' system. In both cases, results are reported in terms of recall and precision.FindingsThe authors' deidentification system recalled 84% of student name tokens in their first data set (96% of full names). On the second data set, it achieved a recall of 74% for student name tokens (91% of full names) and 75% for all direct identifiers. After the second data set was obfuscated by the authors' system, two reviewers attempted to recover the identities of students from the obfuscated data. They performed below chance, indicating that the obfuscated data presents a low identity disclosure risk.Research limitations/implicationsThe two data sets used in this study are not representative of all forms of student-generated text, so further work is needed to evaluate performance on more data.Practical implicationsThis paper presents an open-source and automatic deidentification system appropriate for student-generated text with technical explanations and evaluations of performance.Originality/valuePrevious study on text deidentification has shown success in the medical domain. This paper develops on these approaches and applies them to text in the educational domain.
引用
收藏
页码:266 / 284
页数:19
相关论文
共 50 条
  • [31] Open-Source Antenna Pattern Measurement System
    Hearn, Christian W.
    Birch, Dustin S.
    Newton, Daniel
    Chatlin, Shelby L.
    2020 ANTENNA MEASUREMENT TECHNIQUES ASSOCIATION SYMPOSIUM (AMTA), 2020,
  • [32] Open-Source web-based geographical information system for health exposure assessment
    Evans, Barry
    Sabel, Clive E.
    INTERNATIONAL JOURNAL OF HEALTH GEOGRAPHICS, 2012, 11
  • [33] MetaLIMS, a simple open-source laboratory information management system for small metagenomic labs
    Heinle, Cassie Elizabeth
    Gaultier, Nicolas Paul Eugene
    Miller, Dana
    Purbojati, Rikky Wenang
    Lauro, Federico M.
    GIGASCIENCE, 2017, 6 (06):
  • [34] Research on the Hotspot Information Push System for the Online Journal Based on Open-Source Framework
    Jiang, Jiya
    Liu, Tong
    Shi, Yanqing
    Lu, Changhua
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON SOFT COMPUTING TECHNIQUES AND ENGINEERING APPLICATION, ICSCTEA 2013, 2014, 250 : 81 - 85
  • [35] A General-Purpose AI Assistant Embedded in an Open-Source Radiology Information System
    Purkayastha, Saptarshi
    Isaac, Rohan
    Anthony, Sharon
    Shukla, Shikhar
    Krupinski, Elizabeth A.
    Danish, Joshua A.
    Gichoya, Judy Wawira
    ARTIFICIAL INTELLIGENCE IN MEDICINE, AIME 2023, 2023, 13897 : 373 - 377
  • [36] The NewOS operating system - A lightweight, open-source operating system
    Geiselbrecht, TK
    DR DOBBS JOURNAL, 2001, 26 (12): : 33 - +
  • [37] Security issues in information systems based on open-source technologies
    Greiner, S
    Boskovic, B
    Brest, J
    Zumer, V
    IEEE REGION 8 EUROCON 2003, VOL B, PROCEEDINGS: COMPUTER AS A TOOL, 2003, : 12 - 15
  • [38] Information Specialists' Use of Open-Source Software in Saudi Universities
    Al Sawy, Yaser Mohammad Mohammad
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2021, 21 (10): : 184 - 190
  • [39] Localization of information on communication networks of an open-source online community
    Yang, Jianmei
    Li, Hui
    Liao, Hao
    He, Zheng
    Yang, Huijie
    Xie, Weicong
    INTERNATIONAL JOURNAL OF MODERN PHYSICS C, 2017, 28 (07):
  • [40] Success factors of open-source enterprise information systems development
    Lee, Sang M.
    Lee, Sang-Heui
    INDUSTRIAL MANAGEMENT & DATA SYSTEMS, 2012, 112 (07) : 1065 - 1084