A Data-Driven Analysis of Behaviors in Data Curation Processes

被引:0
|
作者
Han, Lei [1 ]
Chen, Tianwa [1 ]
Demartini, Gianluca [1 ]
Indulska, Marta [1 ]
Sadiq, Shazia [1 ]
机构
[1] Univ Queensland, Brisbane, Qld, Australia
关键词
Interaction behavior; search pattern; data curation; SOFTWARE;
D O I
10.1145/3567419
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Understanding how data workers interact with data, and various pieces of information related to data preparation, is key to designing systems that can better support them in exploring datasets. To date, however, there is a paucity of research studying the strategies adopted by data workers as they carry out data preparation activities. In this work, we investigate a specific data preparation activity, namely data quality discovery, and aim to (i) understand the behaviors of data workers in discovering data quality issues, (ii) explore what factors (e.g., prior experience) can affect their behaviors, as well as (iii) understand how these behavioral observations relate to their performance. To this end, we collect a multi-modal dataset through a data-driven experiment that relies on the use of eye-tracking technology with a purpose-designed platform built on top of iPython Notebook. The experiment results reveal that: (i) 'copy-paste-modify' is a typical strategy for writing code to complete tasks; (ii) proficiency in writing code has a significant impact on the quality of task performance, while perceived difficulty and efficacy can influence task completion patterns; and (iii) searching in external resources is a prevalent action that can be leveraged to achieve better performance. Furthermore, our experiment indicates that providing sample code within the system can help data workers get started with their task, and surfacing underlying data is an effective way to support exploration. By investigating data worker behaviors prior to each search action, we also find that the most common reasons that trigger external search actions are the need to seek assistance in writing or debugging code and to search for relevant code to reuse. Based on our experiment results, we showcase a systematic approach to select from the top best code snippets created by data workers and assemble them to achieve better performance than the best individual performer in the dataset. By doing so, our findings not only provide insights into patterns of interactions with various system components and information resources when performing data curation tasks, but also build effective and efficient data curation processes through data workers' collective intelligence.
引用
下载
收藏
页数:35
相关论文
共 50 条
  • [31] An introduction to Data-Driven control, from kernels to behaviors
    Bazanella, Alexandre Sanfelici
    Campestrini, Luciola
    Eckhard, Diego
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 1079 - 1084
  • [32] Implicit and explicit memory in pregnant women: An analysis of data-driven and conceptually driven processes
    McDowall, J
    Moriarty, R
    QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY SECTION A-HUMAN EXPERIMENTAL PSYCHOLOGY, 2000, 53 (03): : 729 - 740
  • [33] Data-driven representations of conical, convex, and affine behaviors
    Padoan, Alberto
    Dorfler, Florian
    Lygeros, John
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 596 - 601
  • [34] Data-Driven Reachability Analysis From Noisy Data
    Alanwar, Amr
    Koch, Anne
    Allgoewer, Frank
    Johansson, Karl Henrik
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (05) : 3054 - 3069
  • [35] On data-driven curation, learning, and analysis for inferring evolving internet-of-Things (IoT) botnets in the wild
    Pour, Morteza Safaei
    Mangino, Antonio
    Friday, Kurt
    Rathbun, Matthias
    Bou-Harb, Elias
    Iqbal, Farkhund
    Samtani, Sagar
    Crichigno, Jorge
    Ghani, Nasir
    COMPUTERS & SECURITY, 2020, 91
  • [36] Analysis on open data as a foundation for data-driven research
    Numajiri, Honami
    Hayashi, Takayuki
    SCIENTOMETRICS, 2024, 129 (10) : 6315 - 6332
  • [37] Data-Driven Prediction Model for Analysis of Sensor Data
    Yotov, Ognyan
    Aleksieva-Petrova, Adelina
    ELECTRONICS, 2024, 13 (10)
  • [38] Are strategy shifts caused by data-driven processes or by voluntary processes?
    Haider, H
    Frensch, PA
    Joram, D
    CONSCIOUSNESS AND COGNITION, 2005, 14 (03) : 495 - 519
  • [39] Multifidelity approach for data-driven prediction models of structural behaviors with limited data
    Chen, Shi-Zhi
    Feng, De-Cheng
    COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2022, 37 (12) : 1566 - 1581
  • [40] DATA-DRIVEN
    Lev-Ram, Michal
    FORTUNE, 2016, 174 (05) : 76 - 81