A novel framework for Chinese personal sensitive information detection

被引:0
|
作者
Ren, Chenglong [1 ]
Lan, Xiao [2 ,4 ]
Chen, Xingshu [1 ,2 ]
Luo, Yonggang [2 ]
Ruan, Shuhua [1 ,2 ,3 ]
机构
[1] Sichuan Univ, Sch Cyber Sci & Engn, Chengdu, Peoples R China
[2] Sichuan Univ, Cyber Sci Res Inst, Chengdu, Peoples R China
[3] Sichuan Univ, Sch Cyber Sci & Engn, Chengdu 610000, Peoples R China
[4] Sichuan Univ, Cyber Sci Res Inst, Chengdu 610000, Peoples R China
基金
中国国家自然科学基金;
关键词
Chinese; personal sensitive information; rule matching; sequence labeling; context analysis; MODEL;
D O I
10.1080/09540091.2023.2298310
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid development of social networks, the harm caused by the leakage of personal sensitive information is becoming increasingly serious. In order to detect and identify personal sensitive information, existing methods build matching rules to detect specific sensitive entities and use machine learning methods to classify sensitive text. These methods face challenges in context analysis and adapting to Chinese language characteristics. This paper proposes CPSID, a method for detecting Chinese personal sensitive information. On the one hand, CPSID utilises rule matching to detect specific personal sensitive information only containing letters and numbers. More importantly, CPSID constructs a sequence labelling model named EBC (ELECTRA-BiLSTM-CRF) to detect more complex personal sensitive information that consist of Chinese characters. The EBC model uses the latest ELECTRA algorithm to implement word embedding, and uses BiLSTM and CRF models to extract personal sensitive information, which can detect Chinese sensitive entities accurately by analysing context information. The model achieves an F1 score of 94.09% on Chinese datasets, outperforming other similar models. Additionally, experiments on real data show CPSID has a better detection result than individual methods (rule matching or sequence labelling).
引用
收藏
页数:23
相关论文
共 50 条
  • [21] A Framework for Calculating Damages of Personal Information Leakage Accidents
    Lee, Dong-Hee
    Kim, Tae-Sung
    BIG DATA APPLICATIONS AND SERVICES 2017, 2019, 770 : 17 - 22
  • [22] A Research Framework for Improving Personal Information Management Capabilities
    Charlot, Jean-Marc
    INNOVATION AND KNOWLEDGE MANAGEMENT IN BUSINESS GLOBALIZATION: THEORY & PRACTICE, VOLS 1 AND 2, 2008, : 216 - 216
  • [23] Reporting-bias in Surveys of Sensitive Personal Information Reply
    Dunn, Laura B.
    Roberts, Laura W.
    ACADEMIC MEDICINE, 2010, 85 (05) : 743 - 743
  • [24] A Password Strength Evaluation Algorithm based on Sensitive Personal Information
    Cui, Xinchun
    Li, xueqing
    Qin, Yiming
    Yong, Ding
    2020 IEEE 19TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2020), 2020, : 1542 - 1545
  • [25] A novel nanostructured organic framework sensor for selective and sensitive detection of doxycycline based on fluorescence enhancement
    Xu, Xiaowen
    Huang, Lingyi
    Wu, Youjia
    Li, Zhenyue
    Huang, Liying
    SPECTROCHIMICA ACTA PART A-MOLECULAR AND BIOMOLECULAR SPECTROSCOPY, 2023, 288
  • [26] A framework for detection of information privacy violation
    Zuo, Yanjun
    O'Keefe, Timothy
    Fifth Wuhan International Conference on E-Business, Vols 1-3: INTEGRATION AND INNOVATION THROUGH MEASUREMENT AND MANAGEMENT, 2006, : 1404 - 1411
  • [27] "Personal information of privacy nature'"under Chinese civil code
    Zhang, Lu
    COMPUTER LAW & SECURITY REVIEW, 2021, 43
  • [28] Using Personal Information to Aid in Guessing Passwords of Chinese Webs
    Su, Chen
    Zhu, Yuesheng
    2017 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2017,
  • [29] Ubiquitous Personal Study: a framework for supporting information access and sharing
    Hong Chen
    Qun Jin
    Personal and Ubiquitous Computing, 2009, 13 : 539 - 548
  • [30] Ubiquitous Personal Study: a framework for supporting information access and sharing
    Chen, Hong
    Jin, Qun
    PERSONAL AND UBIQUITOUS COMPUTING, 2009, 13 (07) : 539 - 548