Research directions in data wrangling: Visualizations and transformations for usable and credible data

被引:174
|
作者
Kandel, Sean [1 ]
Heer, Jeffrey [1 ]
Plaisant, Catherine [2 ]
Kennedy, Jessie [3 ]
van Ham, Frank
Riche, Nathalie Henry [4 ]
Weaver, Chris [5 ]
Lee, Bongshin [4 ]
Brodbeck, Dominique
Buono, Paolo [6 ]
机构
[1] Stanford Univ, Dept Comp Sci, San Francisco, CA 94107 USA
[2] Univ Maryland, Human Comp Interact Lab, College Pk, MD 20742 USA
[3] Edinburgh Napier Univ, Inst Informat & Digital Innovat, Edinburgh, Midlothian, Scotland
[4] Microsoft Res, Redmond, WA USA
[5] Univ Oklahoma, Sch Comp Sci, Norman, OK 73019 USA
[6] Univ Bari Aldo Moro, Dipartimento Informat, Bari, Italy
基金
美国国家科学基金会;
关键词
data cleaning; data quality; data transformation; uncertainty; visualization; TOOL;
D O I
10.1177/1473871611415994
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In spite of advances in technologies for working with data, analysts still spend an inordinate amount of time diagnosing data quality issues and manipulating data into a usable form. This process of 'data wrangling' often constitutes the most tedious and time-consuming aspect of analysis. Though data cleaning and integration are longstanding issues in the database community, relatively little research has explored how interactive visualization can advance the state of the art. In this article, we review the challenges and opportunities associated with addressing data quality issues. We argue that analysts might more effectively wrangle data through new interactive systems that integrate data verification, transformation, and visualization. We identify a number of outstanding research questions, including how appropriate visual encodings can facilitate apprehension of missing data, discrepant values, and uncertainty; how interactive visualizations might facilitate data transform specification; and how recorded provenance and social interaction might enable wider reuse, verification, and modification of data transformations.
引用
收藏
页码:271 / 288
页数:18
相关论文
共 50 条
  • [1] Towards Automatic Data Format Transformations: Data Wrangling at Scale
    Bogatu, Alex
    Paton, Norman W.
    Fernandes, Alvaro A. A.
    [J]. DATA ANALYTICS, 2017, 10365 : 36 - 48
  • [2] Towards Automatic Data Format Transformations: Data Wrangling at Scale
    Bogatu, Alex
    Paton, Norman W.
    Fernandes, Alvaro A. A.
    Koehler, Martin
    [J]. COMPUTER JOURNAL, 2019, 62 (07): : 1044 - 1060
  • [3] Learning with large, complex data and visualizations: youth data wrangling in modeling family migration
    Kahn, Jennifer
    Jiang, Shiyan
    [J]. LEARNING MEDIA AND TECHNOLOGY, 2021, 46 (02) : 128 - 143
  • [4] Big data: Data wrangling
    Goldston, David
    [J]. NATURE, 2008, 455 (7209) : 15 - 15
  • [5] Big data: Data wrangling
    David Goldston
    [J]. Nature, 2008, 455 : 15 - 15
  • [6] Data Context Informed Data Wrangling
    Koehler, Martin
    Bogatu, Alex
    Civili, Cristina
    Konstantinou, Nikolaos
    Abel, Edward
    Fernandes, Alvaro A. A.
    Keane, John
    Libkin, Leonid
    Paton, Norman W.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 956 - 963
  • [7] Directions Towards Efficient and Automated Data Wrangling with Large Language Models
    Zhang, Zeyu
    Groth, Paul
    Calixto, Iacer
    Schelter, Sebastian
    [J]. 2024 IEEE 40TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOP, ICDEW, 2024, : 301 - 304
  • [8] Fairness in Data Wrangling
    Mazilu, Lacramioara
    Paton, Norman W.
    Konstantinou, Nikolaos
    Fernandes, Alvaro A. A.
    [J]. 2020 IEEE 21ST INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2020), 2020, : 341 - 348
  • [9] Data Wrangling: Making data useful again
    Ender, Florian
    Piringer, Harald
    [J]. IFAC PAPERSONLINE, 2015, 48 (01): : 111 - +
  • [10] Wrangling Categorical Data in R
    McNamara, Amelia
    Horton, Nicholas J.
    [J]. AMERICAN STATISTICIAN, 2018, 72 (01): : 97 - 104