Reconstructing training data from document understanding models

被引:0
|
作者
Dentan, Jeremie [1 ,2 ]
Paran, Arnaud [1 ]
Shabou, Aymen [1 ]
机构
[1] Credit Agr SA, Montrouge, France
[2] IP Paris, Ecole Polytech, Paris, France
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Document understanding models are increasingly employed by companies to supplant humans in processing sensitive documents, such as invoices, tax notices, or even ID cards. However, the robustness of such models to privacy attacks remains vastly unexplored. This paper presents CDMI, the first reconstruction attack designed to extract sensitive fields from the training data of these models. We attack LayoutLM and BROS architectures, demonstrating that an adversary can perfectly reconstruct up to 4.1% of the fields of the documents used for fine-tuning, including some names, dates, and invoice amounts up to six-digit numbers. When our reconstruction attack is combined with a membership inference attack, our attack accuracy escalates to 22.5%. In addition, we introduce two new end-to-end metrics and evaluate our approach under various conditions: unimodal or bimodal data, LayoutLM or BROS backbones, four fine-tuning tasks, and two public datasets (FUNSD and SROIE). We also investigate the interplay between overfitting, predictive performance, and susceptibility to our attack. We conclude with a discussion on possible defenses against our attack and potential future research directions to construct robust document understanding models.
引用
收藏
页码:6813 / 6830
页数:18
相关论文
共 50 条
  • [21] Reconstructing the training data set based on reducing boundary complexity
    Hamidreza Ghaffari
    Farzaneh Rafeie
    Computing, 2024, 106 : 1099 - 1119
  • [22] Understanding corporate data models
    Shanks, G
    Darke, P
    INFORMATION & MANAGEMENT, 1999, 35 (01) : 19 - 30
  • [23] Reconstructing the training data set based on reducing boundary complexity
    Ghaffari, Hamidreza
    Rafeie, Farzaneh
    COMPUTING, 2024, 106 (04) : 1099 - 1119
  • [24] Relational data mining and ILP for document image understanding
    Ceci, Michelangelo
    Berardi, Margherita
    Malerba, Donato
    APPLIED ARTIFICIAL INTELLIGENCE, 2007, 21 (4-5) : 317 - 342
  • [25] RECONSTRUCTING PATTERNS FROM SAMPLE DATA
    SWITZER, P
    ANNALS OF MATHEMATICAL STATISTICS, 1967, 38 (01): : 138 - +
  • [26] RECONSTRUCTING TRAJECTORIES FROM MEASUREMENT DATA
    KIRGETOV, AV
    CHERNOUSKO, FL
    PMM JOURNAL OF APPLIED MATHEMATICS AND MECHANICS, 1995, 59 (01): : 57 - 64
  • [27] Mining Data Impressions From Deep Models as Substitute for the Unavailable Training Data
    Nayak, Gaurav Kumar
    Mopuri, Konda Reddy
    Jain, Saksham
    Chakraborty, Anirban
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (11) : 8465 - 8481
  • [28] Reconstructing Polygons from Scanner Data
    Biedl, Therese
    Durocher, Stephane
    Snoeyink, Jack
    ALGORITHMS AND COMPUTATION, PROCEEDINGS, 2009, 5878 : 862 - +
  • [29] Reconstructing polygons from scanner data
    Biedl, Therese
    Durocher, Stephane
    Snoeyink, Jack
    THEORETICAL COMPUTER SCIENCE, 2011, 412 (32) : 4161 - 4172
  • [30] Reconstructing Graphs from Neighborhood Data
    Erdos, Dora
    Gemulla, Rainer
    Terzi, Evimaria
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2014, 8 (04)