Ensuring High-Quality Private Data for Responsible Data Science: Vision and Challenges

被引:16
|
作者
Srivastava, Divesh [1 ]
Scannapieco, Monica [2 ]
Redman, Thomas C. [3 ]
机构
[1] AT&T Labs Res, Room 4C202B,1 AT&T Way, Bedminster, NJ 07921 USA
[2] Italian Natl Inst Stat, Via C Balbo 16, I-00184 Rome, Italy
[3] Data Qual Solut, 12 Monmouth Ave, Rumson, NJ 07760 USA
来源
关键词
Responsible data science; data trust; private data; quality of private data;
D O I
10.1145/3287168
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
High-quality data is critical for effective data science. As the use of data science has grown, so too have concerns that individuals' rights to privacy will be violated. This has led to the development of data protection regulations around the globe and the use of sophisticated anonymization techniques to protect privacy. Such measures make it more challenging for the data scientist to understand the data, exacerbating issues of data quality. Responsible data science aims to develop useful insights from the data while fully embracing these considerations. We pose the high-level problem in this article, "How can a data scientist develop the needed trust that private data has high quality?" We then identify a series of challenges for various data-centric communities and outline research questions for data quality and privacy researchers, which would need to be addressed to effectively answer the problem posed in this article.
引用
下载
收藏
页码:1 / 9
页数:9
相关论文
共 50 条
  • [31] Special issue on responsible data management and data science
    Huang, Zi
    Shen, Yanyan
    Srivastava, Divesh
    VLDB JOURNAL, 2022, 31 (05): : 823 - 823
  • [32] Special issue on responsible data management and data science
    Zi Huang
    Yanyan Shen
    Divesh Srivastava
    The VLDB Journal, 2022, 31 : 823 - 823
  • [33] ORGANIZATION OF DATA INPUT - THE IMPORTANCE OF RAPID HIGH-QUALITY DATA-COLLECTION
    DEDOMBAL, FT
    ENDOSCOPY, 1992, 24 : 490 - 492
  • [34] Streaming-data algorithms for high-quality clustering
    O'Callaghan, L
    Mishra, N
    Meyerson, A
    Guha, S
    Motwani, R
    18TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2002, : 685 - 694
  • [35] DATA SYSTEM FOR THE ECONOMICAL PRODUCTION OF HIGH-QUALITY YARNS
    不详
    MELLIAND TEXTILBERICHTE INTERNATIONAL TEXTILE REPORTS, 1979, 60 (08): : 640 - 640
  • [36] Efficient High-Quality Volume Rendering of SPH Data
    Fraedrich, Roland
    Auer, Stefan
    Westermann, Ruediger
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2010, 16 (06) : 1533 - 1540
  • [37] Do big numbers assure high-quality of data?
    Crocetti, Emanuele
    Buzzoni, Carlotta
    LANCET HAEMATOLOGY, 2017, 4 (07): : E309 - E309
  • [38] High-Quality Data for Health Care and Health Research
    Stausberg, Juergen
    Harkener, Sonja
    METHODS OF INFORMATION IN MEDICINE, 2023, 62 (01/02) : 1 - 4
  • [39] The importance of high-quality data and analytics during the pandemic
    Petrescu, Maria
    Krishen, Anjala S.
    JOURNAL OF MARKETING ANALYTICS, 2020, 8 (02) : 43 - 44
  • [40] The importance of high-quality data and analytics during the pandemic
    Maria Petrescu
    Anjala S. Krishen
    Journal of Marketing Analytics, 2020, 8 : 43 - 44