Ensuring High-Quality Private Data for Responsible Data Science: Vision and Challenges

被引:16
|
作者
Srivastava, Divesh [1 ]
Scannapieco, Monica [2 ]
Redman, Thomas C. [3 ]
机构
[1] AT&T Labs Res, Room 4C202B,1 AT&T Way, Bedminster, NJ 07921 USA
[2] Italian Natl Inst Stat, Via C Balbo 16, I-00184 Rome, Italy
[3] Data Qual Solut, 12 Monmouth Ave, Rumson, NJ 07760 USA
来源
关键词
Responsible data science; data trust; private data; quality of private data;
D O I
10.1145/3287168
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
High-quality data is critical for effective data science. As the use of data science has grown, so too have concerns that individuals' rights to privacy will be violated. This has led to the development of data protection regulations around the globe and the use of sophisticated anonymization techniques to protect privacy. Such measures make it more challenging for the data scientist to understand the data, exacerbating issues of data quality. Responsible data science aims to develop useful insights from the data while fully embracing these considerations. We pose the high-level problem in this article, "How can a data scientist develop the needed trust that private data has high quality?" We then identify a series of challenges for various data-centric communities and outline research questions for data quality and privacy researchers, which would need to be addressed to effectively answer the problem posed in this article.
引用
下载
收藏
页码:1 / 9
页数:9
相关论文
共 50 条
  • [1] Ensuring high-quality epidemiological data on cancer
    Black, R
    EUROPEAN JOURNAL OF CANCER PREVENTION, 2005, 14 (04) : 305 - 306
  • [2] High-quality science requires high-quality open data infrastructure
    Susanna-Assunta Sansone
    Patricia Cruse
    Mark Thorley
    Scientific Data, 5
  • [3] Ensuring High-Quality Data Collection for Mobile Crowd Sensing
    Gao, Hui
    Liu, Chi Harold
    Tian, Ye
    Xi, Teng
    Wang, Wendong
    2017 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC), 2017,
  • [4] Comment: High-quality science requires high-quality open data infrastructure
    Sansone, Susanna-Assunta
    Cruse, Patricia
    Thorley, Mark
    SCIENTIFIC DATA, 2018, 5
  • [5] Ensuring database quality - A new vision for data management
    Ambler, Scott W.
    DR DOBBS JOURNAL, 2006, 31 (12): : 63 - 65
  • [6] YummyData: providing high-quality open life science data
    Yamamoto, Yasunori
    Yamaguchi, Atsuko
    Splendiani, Andrea
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2018,
  • [7] Ensuring high-quality education
    Stein, RA
    ABA JOURNAL, 1996, 82 : 96 - 96
  • [8] Responsible Data Science
    Getoor, Lise
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 1 - 1
  • [9] Responsible Data Science
    Wil M. P. van der Aalst
    Martin Bichler
    Armin Heinzl
    Business & Information Systems Engineering, 2017, 59 : 311 - 313
  • [10] Responsible Data Science
    Jagadish, H., V
    PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19), 2019, : 2 - 2