AI Assistants: A Framework for Semi-Automated Data Wrangling

被引:1
|
作者
Petricek, Tomas [1 ]
van den Burg, Gerrit J. J. [2 ,3 ]
Nazabal, Alfredo [2 ,3 ]
Ceritli, Taha [2 ,4 ,5 ]
Jimenez-Ruiz, Ernesto [6 ,7 ]
Williams, Christopher K. I. [2 ,8 ]
机构
[1] Charles Univ Prague, Prague 11000, Czech Republic
[2] Alan Turing Inst, London NW1 2DB, England
[3] Amazon Dev Ctr Scotland, Edinburgh EH1 3EG, Scotland
[4] Univ Edinburgh, Edinburgh, Scotland
[5] Univ Oxford, Oxford OX1 2JD, England
[6] City Univ London, London EC1V 0HB, England
[7] Univ Oslo, N-0315 Oslo, Norway
[8] Univ Edinburgh, Edinburgh EH8 9YL, Scotland
基金
英国工程与自然科学研究理事会;
关键词
Artificial intelligence; Task analysis; Merging; Machine learning; Cleaning; Data science; Iterative methods; Data wrangling; data cleaning; human-in-the-loop;
D O I
10.1109/TKDE.2022.3222538
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data wrangling tasks such as obtaining and linking data from various sources, transforming data formats, and correcting erroneous records, can constitute up to 80% of typical data engineering work. Despite the rise of machine learning and artificial intelligence, data wrangling remains a tedious and manual task. We introduce AI assistants, a class of semi-automatic interactive tools to streamline data wrangling. An AI assistant guides the analyst through a specific data wrangling task by recommending a suitable data transformation that respects the constraints obtained through interaction with the analyst. We formally define the structure of AI assistants and describe how existing tools that treat data cleaning as an optimization problem fit the definition. We implement AI assistants for four common data wrangling tasks and make AI assistants easily accessible to data analysts in an open-source notebook environment for data science, by leveraging the common structure they follow. We evaluate our AI assistants both quantitatively and qualitatively through three example scenarios. We show that the unified and interactive design makes it easy to perform tasks that would be difficult to do manually or with a fully automatic tool.
引用
收藏
页码:9295 / 9306
页数:12
相关论文
共 50 条
  • [1] A FRAMEWORK FOR SEMI-AUTOMATED IMPLEMENTATION OF MULTIDIMENSIONAL DATA MODELS
    Nagy, Ilona Mariana
    [J]. INTERNATIONAL CONFERENCE ON INFORMATICS IN ECONOMY, 2012, : 169 - 174
  • [2] Semi-Automated Data Labeling
    Desmond, Michael
    Duesterwald, Evelyn
    Brimijoin, Kristina
    Brachman, Michelle
    Pan, Qian
    [J]. NEURIPS 2020 COMPETITION AND DEMONSTRATION TRACK, VOL 133, 2020, 133 : 156 - 169
  • [3] A semi-automated hybrid schema matching framework for vegetation data integration
    Asif-Ur-Rahman, Md
    Hossain, Bayzid Ashik
    Bewong, Michael
    Islam, Md Zahidul
    Zhao, Yanchang
    Groves, Jeremy
    Judith, Rory
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 229
  • [4] A semi-automated hybrid schema matching framework for vegetation data integration
    Asif-Ur-Rahman, Md
    Hossain, Bayzid Ashik
    Bewong, Michael
    Islam, Md Zahidul
    Zhao, Yanchang
    Groves, Jeremy
    Judith, Rory
    [J]. Expert Systems with Applications, 2023, 229
  • [5] A Semi-Automated Hybrid Schema Matching Framework for Vegetation Data Integration
    Asif-Ur-Rahman, Md.
    Hossain, Bayzid Ashik
    Bewong, Michael
    Islam, Md Zahidul
    Zhao, Yanchang
    Groves, Jeremy
    Judith, Rory
    [J]. arXiv, 2023,
  • [6] A Framework for Semi-Automated Generation of a Virtual Combine Harvester
    Hermann, D.
    Bilde, M. L.
    Andersen, N. A.
    Ravn, O.
    [J]. IFAC PAPERSONLINE, 2016, 49 (16): : 55 - 60
  • [7] Optimal Data Partition for Semi-Automated Labeling
    Lopresti, Daniel
    Nagy, George
    [J]. 2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 286 - 289
  • [8] A dynamic design framework for semi-automated service orchestration
    Xie, Wuping
    Xue, Jinyun
    Jiang, Dongming
    [J]. Journal of Computational Information Systems, 2014, 10 (13): : 5549 - 5556
  • [9] A GOMSL Analysis of Semi-Automated Data Entry
    Haimson, Craig
    Grossman, Justin
    [J]. EICS'09: PROCEEDINGS OF THE ACM SIGCHI SYMPOSIUM ON ENGINEERING INTERACTIVE COMPUTING SYSTEMS, 2009, : 61 - 65
  • [10] A semi-automated framework for semantically annotating web content
    Abdou, Mohamed
    AbdelGaber, Sayed
    Farhan, Marwa
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 81 : 94 - 102