VizSmith: Automated Visualization Synthesis by Mining Data-Science Notebooks

被引:5
|
作者
Bavishi, Rohan [1 ]
Laddad, Shadaj [1 ]
Yoshida, Hiroaki [2 ]
Prasad, Mukul R. [2 ]
Sen, Koushik [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Fujitsu Res Amer, Sunnyvale, CA USA
关键词
CODE;
D O I
10.1109/ASE51524.2021.9678696
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Visualizations are widely used to communicate findings and make data-driven decisions. Unfortunately creating bespoke and reproducible visualizations requires the use of procedural tools such as matplotlib. These tools present a steep learning curve as their documentation often lacks sufficient usage examples to help beginners get started or accomplish a specific task. Forums such as StackOverflow have long helped developers search for code online and adapt it for their use. However, developers still have to sift through search results and understand the code before adapting it for their use. We built a tool called VizSmall which enables code reuse for visualizations by mining visualization code from Kaggle notebooks and creating a database of 7176 reusable Python functions. Given a dataset, columns to visualize and a text query from the user, VIZSMITH searches this database for appropriate functions, runs them and displays the generated visualizations to the user. At the core of VIZSMITH is a novel metamorphic testing based approach to automatically assess the reusability of functions, which improves end-to-end synthesis performance by 10% and cuts the number of execution failures by 50%.
引用
收藏
页码:129 / 141
页数:13
相关论文
共 50 条
  • [1] Data-science driven autonomous process optimization
    Christensen, Melodie
    Yunker, Lars P. E.
    Adedeji, Folarin
    Hase, Florian
    Roch, Loic M.
    Gensch, Tobias
    dos Passos Gomes, Gabriel
    Zepel, Tara
    Sigman, Matthew S.
    Aspuru-Guzik, Alan
    Hein, Jason E.
    [J]. COMMUNICATIONS CHEMISTRY, 2021, 4 (01)
  • [2] On traversing the data landscape: Introducing APIs to data-science students
    Fergusson, Anna
    Wild, Chris J.
    [J]. TEACHING STATISTICS, 2021, 43 : S71 - S83
  • [3] How to deliver translational data-science benefits to science and society
    Chaitanya Baru
    [J]. Nature, 2018, 561 (7724) : 464 - 464
  • [4] Data-science driven autonomous process optimization
    Melodie Christensen
    Lars P. E. Yunker
    Folarin Adedeji
    Florian Häse
    Loïc M. Roch
    Tobias Gensch
    Gabriel dos Passos Gomes
    Tara Zepel
    Matthew S. Sigman
    Alán Aspuru-Guzik
    Jason E. Hein
    [J]. Communications Chemistry, 4
  • [5] Data Science - Cosmic Infoset Mining, Modeling and Visualization
    Kumar, Subhashish
    Dhanda, Namrata
    Pandey, Ashutosh
    [J]. 2018 INTERNATIONAL CONFERENCE ON COMPUTATIONAL AND CHARACTERIZATION TECHNIQUES IN ENGINEERING & SCIENCES (CCTES), 2018, : 1 - 4
  • [6] Interactive Data Visualization in Jupyter Notebooks
    Piazentin Ono, Jorge
    Freire, Juliana
    Silva, Claudio T.
    [J]. COMPUTING IN SCIENCE & ENGINEERING, 2021, 23 (02) : 99 - 106
  • [7] Notebooks for Data Analysis and Visualization: Moving Beyond the Data
    Kosara, Robert
    [J]. IEEE COMPUTER GRAPHICS AND APPLICATIONS, 2023, 43 (01) : 91 - 96
  • [8] Proposal of a morphological box for the characterization of data-science projects
    Theuerkauf R.
    Daurer S.
    Hoseini S.
    Kaufmann J.
    Kühnel S.
    Schwade F.
    Alekozai E.M.
    Neuhaus U.
    Rohde H.
    Schulz M.
    [J]. Informatik-Spektrum, 2022, 45 (06) : 395 - 401
  • [9] Reanalysis of the data-science at its best and always informative
    Victory, Rahi
    Diamond, Michael P.
    Sokol, Robert J.
    Malone, John M., Jr.
    [J]. FERTILITY AND STERILITY, 2006, 85 (06) : E13 - E13
  • [10] MIDST: A System to Support Stigmergic Coordination in Data-Science Teams
    Crowston, Kevin
    Saltz, Jeffrey S.
    Rezgui, Amira
    Hegde, Yatish
    You, Sangseok
    [J]. CONFERENCE COMPANION PUBLICATION OF THE 2019 COMPUTER SUPPORTED COOPERATIVE WORK AND SOCIAL COMPUTING (CSCW'19 COMPANION), 2019, : 5 - 8