CoCo: Interactive Exploration of Conformance Constraints for Data Understanding and Data Cleaning

被引:2
|
作者
Fariha, Anna [1 ]
Tiwari, Ashish [2 ]
Meliou, Alexandra [1 ]
Radhakrishna, Arjun [2 ]
Gulwani, Sumit [2 ]
机构
[1] Univ Massachusetts Amherst, Amherst, MA 01003 USA
[2] Microsoft, Redmond, WA USA
关键词
D O I
10.1145/3448016.3452750
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data profiling refers to the task of extracting technical metadata or profiles and has numerous applications such as data understanding, validation, integration, and cleaning. While a number of data profiling primitives exist in the literature, most of them are limited to categorical attributes. A few techniques consider numerical attributes; but, they either focus on simple relationships involving a pair of attributes (e.g., correlations) or convert the continuous semantics of numerical attributes to a discrete semantics, which results in information loss. To capture more complex relationships involving the numerical attributes, we developed a new data-profiling primitive called conformance constraints, which can model linear arithmetic relationships involving multiple numerical attributes. We present CoCo, a system that allows interactive discovery and exploration of Conformance Constraints for understanding trends involving the numerical attributes of a dataset, with a particular focus on the application of data cleaning. Through a simple interface, CoCo enables the user to guide conformance constraint discovery according to their preferences. The user can examine to what extent a new, possibly dirty, dataset satisfies or violates the discovered conformance constraints. Further, CoCo provides useful suggestions for cleaning dirty data tuples, where the user can interactively alter cell values, and verify by checking change in conformance constraint violation due to the alteration. We demonstrate how CoCo can help in understanding trends in the data and assist the users in interactive data cleaning, using conformance constraints.
引用
收藏
页码:2706 / 2710
页数:5
相关论文
共 50 条
  • [1] CurrentClean: Interactive Change Exploration and Cleaning of Stale Data
    Zheng, Zheng
    Tri Minh Quach
    Jin, Ziyi
    Chiang, Fei
    Milani, Mostafa
    [J]. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2917 - 2920
  • [2] Supporting Visual Data Exploration via Interactive Constraints
    Lucas, Wendy
    Gordon, Taylor
    [J]. SOFTWARE TECHNOLOGIES, 2017, 743 : 132 - 152
  • [3] Interactive data exploration
    Arthur, GC
    [J]. THIRD REGIONAL APCOM: COMPUTER APPLICATIONS IN THE MINERALS INDUSTRIES INTERNATIONAL SYMPOSIUM, 1998, 98 (05): : 45 - 48
  • [4] Understanding Pen and Touch Interaction for Data Exploration on Interactive Whiteboards
    Walny, Jagoda
    Lee, Bongshin
    Johns, Paul
    Riche, Nathalie Henry
    Carpendale, Sheelagh
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2012, 18 (12) : 2779 - 2788
  • [5] Cleaning Data with Constraints and Experts
    Assadi, Ahmad
    Milo, Tova
    Novgorodov, Slava
    [J]. PROCEEDINGS OF THE 21ST WORKSHOP ON THE WEB AND DATABASES (WEBDB 2018), 2018,
  • [6] Cleaning Framework for BigData - AN INTERACTIVE APPROACH FOR DATA CLEANING
    Liu, Hong
    Kumar, Ashwin T. K.
    Thomas, Johnson P.
    Hou, Xiaofei
    [J]. PROCEEDINGS 2016 IEEE SECOND INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2016), 2016, : 174 - 181
  • [7] A Dynamic Path Data Cleaning Algorithm Based on Constraints for RFID Data Cleaning
    Hu, Kongfa
    Li, Long
    Hu, Chengjun
    Xie, Jiadong
    Lu, Zhipeng
    [J]. 2014 11TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2014, : 537 - 541
  • [8] DANCE: Data Cleaning with Constraints and Experts
    Assadi, Ahmad
    Milo, Tova
    Novgorodov, Slava
    [J]. 2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 1409 - 1410
  • [9] A Revival of Integrity Constraints for Data Cleaning
    Fan, Wenfei
    Geerts, Floris
    Jia, Xibei
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (02): : 1522 - 1523
  • [10] Interactive data exploration with customized glyphs
    Kraus, M
    Ertl, T
    [J]. WSCG '2001: SHORT COMMUNICATIONS AND POSTERS, 2001, : P20 - P23