Enabling Synthetic Data adoption in regulated domains

被引:1
|
作者
Visani, Giorgio [1 ,2 ]
Graffi, Giacomo [2 ]
Alfero, Mattia [2 ]
Bagli, Enrico [2 ]
Chesani, Federico [1 ]
Capuzzo, Davide [2 ]
机构
[1] Univ Bologna, DISI Dept, Bologna, Italy
[2] CRIF SpA, R&D Dept, Bologna, Italy
关键词
synthetic data; benchmarks; goodness evaluation; data utility; privacy; finance; data sharing;
D O I
10.1109/DSAA54385.2022.10032356
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The switch from a Model-Centric to a Data-Centric mindset is putting emphasis on data and its quality rather than algorithms, bringing forward new challenges. In particular, the sensitive nature of the information in highly regulated scenarios needs to be accounted for. Specific approaches to address the privacy issue have been developed, as Privacy Enhancing Technologies. However, they frequently cause loss of information, putting forward a crucial trade-off among data quality and privacy. A clever way to bypass such a conundrum relies on Synthetic Data: data obtained from a generative process, learning the real data properties. Both Academia and Industry realized the importance of evaluating synthetic data quality: without all-round reliable metrics, the innovative data generation task has no proper objective function to maximize. Despite that, the topic remains under-explored. For this reason, we systematically catalog the important traits of synthetic data quality and privacy, and devise a specific methodology to test them. The result is DAISYnt (aDoption of Artificial Intelligence SYnthesis): a comprehensive suite of advanced tests, which sets a de facto standard for synthetic data evaluation. As a practical use-case, a variety of generative algorithms have been trained on real-world Credit Bureau Data. The best model has been assessed, using DAISYnt on the different synthetic replicas. Further potential uses, among others, entail auditing and fine-tuning of generative models or ensuring high quality of a given synthetic dataset. From a prescriptive viewpoint, eventually, DAISYnt may pave the way to synthetic data adoption in highly regulated domains, ranging from Finance to Healthcare, through Insurance and Education.
引用
下载
收藏
页码:475 / 484
页数:10
相关论文
共 50 条
  • [21] A comparison of synthetic data generation and federated analysis for enabling international evaluations of cardiovascular health
    Azizi, Zahra
    Lindner, Simon
    Shiba, Yumika
    Raparelli, Valeria
    Norris, Colleen M.
    Kublickiene, Karolina
    Herrero, Maria Trinidad
    Kautzky-Willer, Alexandra
    Klimek, Peter
    Gisinger, Teresa
    Pilote, Louise
    El Emam, Khaled
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [22] A comparison of synthetic data generation and federated analysis for enabling international evaluations of cardiovascular health
    Zahra Azizi
    Simon Lindner
    Yumika Shiba
    Valeria Raparelli
    Colleen M. Norris
    Karolina Kublickiene
    Maria Trinidad Herrero
    Alexandra Kautzky-Willer
    Peter Klimek
    Teresa Gisinger
    Louise Pilote
    Khaled El Emam
    Scientific Reports, 13
  • [23] Synthetic controls with staggered adoption
    Ben-Michael, Eli
    Feller, Avi
    Rothstein, Jesse
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2022, 84 (02) : 351 - 381
  • [24] The Quest to Become a Data-Driven Entity: Identification of Socio-enabling Factors of AI Adoption
    Smit, Danie
    Eybers, Sunet
    de Waal, Alta
    Wies, Rene
    INFORMATION SYSTEMS AND TECHNOLOGIES, WORLDCIST 2022, VOL 1, 2022, 468 : 589 - 599
  • [25] Triazolinediones as Highly Enabling Synthetic Tools
    De Bruycker, Kevin
    Billiet, Stijn
    Houck, Hannes A.
    Chattopadhyay, Subrata
    Winne, Johan M.
    Du Prez, Filip E.
    CHEMICAL REVIEWS, 2016, 116 (06) : 3919 - 3974
  • [26] A survey of enabling technologies in synthetic biology
    Kahl, Linda J.
    Endy, Drew
    JOURNAL OF BIOLOGICAL ENGINEERING, 2013, 7 (01):
  • [27] A survey of enabling technologies in synthetic biology
    Linda J Kahl
    Drew Endy
    Journal of Biological Engineering, 7
  • [28] The enabling environment for household solar adoption: A systematic review
    Girardeau, Hannah
    Oberholzer, Alicia
    Pattanayak, Subhrendu K.
    WORLD DEVELOPMENT PERSPECTIVES, 2021, 21
  • [29] Adoption of sensory enabling technology for online apparel shopping
    Kim, Jiyeon
    Forsythe, Sandra
    EUROPEAN JOURNAL OF MARKETING, 2009, 43 (9-10) : 1101 - 1120
  • [30] Enabling agile adoption practices through network organizations
    Hovorka, Dirk S.
    Larsen, Kai R.
    EUROPEAN JOURNAL OF INFORMATION SYSTEMS, 2006, 15 (02) : 159 - 168