Enabling Synthetic Data adoption in regulated domains

被引:1
|
作者
Visani, Giorgio [1 ,2 ]
Graffi, Giacomo [2 ]
Alfero, Mattia [2 ]
Bagli, Enrico [2 ]
Chesani, Federico [1 ]
Capuzzo, Davide [2 ]
机构
[1] Univ Bologna, DISI Dept, Bologna, Italy
[2] CRIF SpA, R&D Dept, Bologna, Italy
关键词
synthetic data; benchmarks; goodness evaluation; data utility; privacy; finance; data sharing;
D O I
10.1109/DSAA54385.2022.10032356
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The switch from a Model-Centric to a Data-Centric mindset is putting emphasis on data and its quality rather than algorithms, bringing forward new challenges. In particular, the sensitive nature of the information in highly regulated scenarios needs to be accounted for. Specific approaches to address the privacy issue have been developed, as Privacy Enhancing Technologies. However, they frequently cause loss of information, putting forward a crucial trade-off among data quality and privacy. A clever way to bypass such a conundrum relies on Synthetic Data: data obtained from a generative process, learning the real data properties. Both Academia and Industry realized the importance of evaluating synthetic data quality: without all-round reliable metrics, the innovative data generation task has no proper objective function to maximize. Despite that, the topic remains under-explored. For this reason, we systematically catalog the important traits of synthetic data quality and privacy, and devise a specific methodology to test them. The result is DAISYnt (aDoption of Artificial Intelligence SYnthesis): a comprehensive suite of advanced tests, which sets a de facto standard for synthetic data evaluation. As a practical use-case, a variety of generative algorithms have been trained on real-world Credit Bureau Data. The best model has been assessed, using DAISYnt on the different synthetic replicas. Further potential uses, among others, entail auditing and fine-tuning of generative models or ensuring high quality of a given synthetic dataset. From a prescriptive viewpoint, eventually, DAISYnt may pave the way to synthetic data adoption in highly regulated domains, ranging from Finance to Healthcare, through Insurance and Education.
引用
下载
收藏
页码:475 / 484
页数:10
相关论文
共 50 条
  • [31] A synthetic theory of sequential domains
    Reus, Bernhard
    Streicher, Thomas
    ANNALS OF PURE AND APPLIED LOGIC, 2012, 163 (08) : 1062 - 1074
  • [32] Regulated domains and Bergman type projections
    Taskinen, J
    ANNALES ACADEMIAE SCIENTIARUM FENNICAE-MATHEMATICA, 2003, 28 (01) : 55 - 68
  • [33] Technology adoption and performance impact in innovation domains
    Plewa, Carolin
    Troshani, Indrit
    Francis, Anthony
    Rampersad, Giselle
    INDUSTRIAL MANAGEMENT & DATA SYSTEMS, 2012, 112 (5-6) : 748 - 765
  • [34] Diffusion-HPC: Synthetic Data Generation for Human Mesh Recovery in Challenging Domains
    Weng, Zhenzhen
    Bravo-Sanchez, Laura
    Yeung-Levy, Serena
    2024 INTERNATIONAL CONFERENCE IN 3D VISION, 3DV 2024, 2024, : 257 - 267
  • [35] ADOPTION OF SYNTHETIC RUBBERS IN TIRE INDUSTRY
    EVSTRATOV, VF
    ZHURNAL VSESOYUZNOGO KHIMICHESKOGO OBSHCHESTVA IMENI D I MENDELEEVA, 1981, 26 (03): : 247 - 252
  • [36] Timing of Adoption of Clean Technologies by Regulated Monopolies
    Ben Youssef, Slim
    PANOECONOMICUS, 2015, 62 (01) : 77 - 92
  • [37] Enabling the Scalability of Industrial Networks by Independent Scheduling Domains
    Christodoulopoulos, Konstantinos
    Lautenschlaeger, Wolfram
    Frick, Florian
    Benzaoui, Nihel
    Henke, Torben
    Gebhard, Ulrich
    Dembeck, Lars
    Lechler, Armin
    Pointurier, Yvan
    Bigo, Sebastien
    2020 OPTICAL FIBER COMMUNICATIONS CONFERENCE AND EXPOSITION (OFC), 2020,
  • [38] Enabling technology and core theory of synthetic biology
    XianEn Zhang
    Chenli Liu
    Junbiao Dai
    Yingjin Yuan
    Caixia Gao
    Yan Feng
    Bian Wu
    Ping Wei
    Chun You
    Xiaowo Wang
    Tong Si
    Science China(Life Sciences), 2023, 66 (08) : 1742 - 1785
  • [39] Enabling technology and core theory of synthetic biology
    Xian-En Zhang
    Chenli Liu
    Junbiao Dai
    Yingjin Yuan
    Caixia Gao
    Yan Feng
    Bian Wu
    Ping Wei
    Chun You
    Xiaowo Wang
    Tong Si
    Science China Life Sciences, 2023, 66 : 1742 - 1785
  • [40] Enabling technology and core theory of synthetic biology
    Xian-En Zhang
    Chenli Liu
    Junbiao Dai
    Yingjin Yuan
    Caixia Gao
    Yan Feng
    Bian Wu
    Ping Wei
    Chun You
    Xiaowo Wang
    Tong Si
    Science China Life Sciences, 2023, (08) : 1742 - 1785