AI Assistants: A Framework for Semi-Automated Data Wrangling

被引:3
|
作者
Petricek, Tomas [1 ]
van den Burg, Gerrit J. J. [2 ,3 ]
Nazabal, Alfredo [2 ,3 ]
Ceritli, Taha [2 ,4 ,5 ]
Jimenez-Ruiz, Ernesto [6 ,7 ]
Williams, Christopher K. I. [2 ,8 ]
机构
[1] Charles Univ Prague, Prague 11000, Czech Republic
[2] Alan Turing Inst, London NW1 2DB, England
[3] Amazon Dev Ctr Scotland, Edinburgh EH1 3EG, Scotland
[4] Univ Edinburgh, Edinburgh, Scotland
[5] Univ Oxford, Oxford OX1 2JD, England
[6] City Univ London, London EC1V 0HB, England
[7] Univ Oslo, N-0315 Oslo, Norway
[8] Univ Edinburgh, Edinburgh EH8 9YL, Scotland
基金
英国工程与自然科学研究理事会;
关键词
Artificial intelligence; Task analysis; Merging; Machine learning; Cleaning; Data science; Iterative methods; Data wrangling; data cleaning; human-in-the-loop;
D O I
10.1109/TKDE.2022.3222538
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data wrangling tasks such as obtaining and linking data from various sources, transforming data formats, and correcting erroneous records, can constitute up to 80% of typical data engineering work. Despite the rise of machine learning and artificial intelligence, data wrangling remains a tedious and manual task. We introduce AI assistants, a class of semi-automatic interactive tools to streamline data wrangling. An AI assistant guides the analyst through a specific data wrangling task by recommending a suitable data transformation that respects the constraints obtained through interaction with the analyst. We formally define the structure of AI assistants and describe how existing tools that treat data cleaning as an optimization problem fit the definition. We implement AI assistants for four common data wrangling tasks and make AI assistants easily accessible to data analysts in an open-source notebook environment for data science, by leveraging the common structure they follow. We evaluate our AI assistants both quantitatively and qualitatively through three example scenarios. We show that the unified and interactive design makes it easy to perform tasks that would be difficult to do manually or with a fully automatic tool.
引用
收藏
页码:9295 / 9306
页数:12
相关论文
共 50 条
  • [21] A Semi-Automated Workflow Solution for Data Set Publication
    Vannan, Suresh
    Beaty, Tammy W.
    Cook, Robert B.
    Wright, Daine M.
    Devarakonda, Ranjeet
    Wei, Yaxing
    Hook, Les A.
    McMurry, Benjamin F.
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2016, 5 (03):
  • [22] Road Scene Data Annotation with Semi-Automated Active Learning Framework for Convolutional Neural Networks
    Sofian, Mohd Hafiz Hilman Mohammad
    Ito, Toshio
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2022, 13 (05) : 441 - 449
  • [23] Automated and semi-automated map georeferencing
    Burt, James E.
    White, Jeremy
    Allord, Gregory
    Then, Kenneth M.
    Zhu, A-Xing
    CARTOGRAPHY AND GEOGRAPHIC INFORMATION SCIENCE, 2020, 47 (01) : 46 - 66
  • [24] A Framework for Semi-Automated Fault Detection Configuration with Automated Feature Extraction and Limits Setting
    Cai, Haoshu
    Feng, Jianshe
    Moyne, James
    Iskandar, Jimmy
    Armacost, Michael
    Li, Fei
    Lee, Jay
    2020 31ST ANNUAL SEMI ADVANCED SEMICONDUCTOR MANUFACTURING CONFERENCE (ASMC), 2020,
  • [25] Development of a Semi-Automated Segmentation Framework for Thoracic-Abdominal Organs
    Abd Rahni, Ashrani Aizzuddin
    Lewis, Emma
    Wells, Kevin
    2013 IEEE INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING APPLICATIONS (IEEE ICSIPA 2013), 2013, : 232 - 236
  • [26] A semi-automated, GIS-based framework for the mapping of supraglacial hydrology
    Bash, Eleanor A.
    Shellian, Colette
    Dow, Christine F.
    Mcdermid, Greg
    Kochtitzky, Will
    Medrzycka, Dorota
    Copland, Luke
    JOURNAL OF GLACIOLOGY, 2023, 69 (276) : 708 - 722
  • [27] A novel framework for semi-automated system for grape leaf disease detection
    Navneet Kaur
    V. Devendran
    Multimedia Tools and Applications, 2024, 83 : 50733 - 50755
  • [28] A novel framework for semi-automated system for grape leaf disease detection
    Kaur, Navneet
    Devendran, V.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (17) : 50733 - 50755
  • [29] A cost-effective and semi-automated annotation framework for OCT scans
    Chakroborty, Sandipan
    Patel, Krunalkumar Ramanbhai
    Modi, Ashish Kumar
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2019, 60 (09)
  • [30] Development and validation of a semi-automated, scalable response to intervention framework in mathematics
    Eudald Correig-Fraga
    Albert Vilalta-Riera
    Cecilia Calvo-Pesce
    SN Social Sciences, 4 (2):