A novel framework for Chinese personal sensitive information detection

被引:0
|
作者
Ren, Chenglong [1 ]
Lan, Xiao [2 ,4 ]
Chen, Xingshu [1 ,2 ]
Luo, Yonggang [2 ]
Ruan, Shuhua [1 ,2 ,3 ]
机构
[1] Sichuan Univ, Sch Cyber Sci & Engn, Chengdu, Peoples R China
[2] Sichuan Univ, Cyber Sci Res Inst, Chengdu, Peoples R China
[3] Sichuan Univ, Sch Cyber Sci & Engn, Chengdu 610000, Peoples R China
[4] Sichuan Univ, Cyber Sci Res Inst, Chengdu 610000, Peoples R China
基金
中国国家自然科学基金;
关键词
Chinese; personal sensitive information; rule matching; sequence labeling; context analysis; MODEL;
D O I
10.1080/09540091.2023.2298310
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid development of social networks, the harm caused by the leakage of personal sensitive information is becoming increasingly serious. In order to detect and identify personal sensitive information, existing methods build matching rules to detect specific sensitive entities and use machine learning methods to classify sensitive text. These methods face challenges in context analysis and adapting to Chinese language characteristics. This paper proposes CPSID, a method for detecting Chinese personal sensitive information. On the one hand, CPSID utilises rule matching to detect specific personal sensitive information only containing letters and numbers. More importantly, CPSID constructs a sequence labelling model named EBC (ELECTRA-BiLSTM-CRF) to detect more complex personal sensitive information that consist of Chinese characters. The EBC model uses the latest ELECTRA algorithm to implement word embedding, and uses BiLSTM and CRF models to extract personal sensitive information, which can detect Chinese sensitive entities accurately by analysing context information. The model achieves an F1 score of 94.09% on Chinese datasets, outperforming other similar models. Additionally, experiments on real data show CPSID has a better detection result than individual methods (rule matching or sequence labelling).
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Examining Sensitive Personal Information Protection in China: Framework, Obstacles, and Solutions
    Li, Qian
    Jiang, Tao
    Fan, Xijian
    INFORMATION & CULTURE, 2023, 58 (03): : 247 - 273
  • [2] WHETHER PERSONAL CREDIT INFORMATION IS SENSITIVE PERSONAL INFORMATION?
    Shanghong, Yin
    TSINGHUA CHINA LAW REVIEW, 2023, 15 (02): : 359 - 377
  • [3] A Novel Sentiment Polarity Detection Framework for Chinese
    Ma, Tinghuai
    Rong, Huan
    Hao, Yongsheng
    Cao, Jie
    Tian, Yuan
    Al-Rodhaan, Mznah
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (01) : 60 - 74
  • [4] SIDF: A Desensitization Framework for Sensitive Information in Chinese Medical Report Images
    Zhang, Li
    Li, Yue-Feng
    Zhang, Yu
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 1639 - 1644
  • [5] Designing Sensitive Personal Information Detection and Classification Model for Amharic Text
    Genetu, Amare
    Tegegne, Tesfa
    2021 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY FOR DEVELOPMENT FOR AFRICA (ICT4DA), 2021, : 54 - 58
  • [6] How Sensitive is Your Personal Information?
    Al-Fedaghi, Sabah
    APPLIED COMPUTING 2007, VOL 1 AND 2, 2007, : 165 - 169
  • [7] A framework for protecting personal information and privacy
    Zheng, Hongying
    Yuan, Quan
    Chen, Jianyong
    SECURITY AND COMMUNICATION NETWORKS, 2015, 8 (16) : 2867 - 2874
  • [8] A Framework for Personal Information Integration in Organizations
    Cao, Luhui
    Li, Qingzhong
    Gao, Xiang
    2009 SIXTH WEB INFORMATION SYSTEMS AND APPLICATIONS CONFERENCE, PROCEEDINGS, 2009, : 206 - +
  • [9] Internet Security Protection in Personal Sensitive Information
    Wang, Yubin
    Li, Chao
    Cheng, Nan
    2014 TENTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2014, : 628 - 632
  • [10] A normative framework for public and personal access to information
    Kemppainen, E
    ASSISTIVE TECHNOLOGY ON THE THRESHOLD OF THE NEW MILLENNIUM, 1999, 6 : 800 - 804