A Roadmap for Enriching Jupyter Notebooks Documentation with Kaggle Data

被引:0
|
作者
Ghahfarokhi, Mojtaba Mostafavi [1 ]
Jahantigh, Hamed [1 ]
Asadi, Alireza [1 ]
Kianiangolafshani, Sepehr [1 ]
Khademian, Ashkan [1 ]
Heydarnoori, Abbas [1 ,2 ]
机构
[1] Sharif Univ Technol, Dept Comp Engn, Tehran, Iran
[2] Bowling Green State Univ, Dept Comp Sci, Bowling Green, OH USA
关键词
Jupyter Notebooks; Kaggle Dataset; Markdown Generation;
D O I
10.1145/3644815.3644984
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advancements in AI and data science have led to the increased use of Jupyter notebooks. As such, various AI-Based automated tools have been also developed to automatically document notebooks. However, a key challenge is the absence of suitable datasets for training AI models. In this paper, we outline a valuable roadmap for developing a dataset of (markdown, code) pairs centered on functions in Jupyter notebooks. The roadmap encompasses four high-level steps: structural filtering, structural processing, conceptual filtering, and conceptual processing. Our proposed roadmap leads to providing a quality dataset for training AI models on Jupyter notebooks.
引用
收藏
页码:271 / 272
页数:2
相关论文
共 50 条
  • [1] DistilKaggle: A Distilled Dataset of Kaggle Jupyter Notebooks
    Ghahfarokhi, Mojtaba Mostafavi
    Asgari, Arash
    Abolnejadian, Mohammad
    Heydarnoori, Abbas
    [J]. 2024 IEEE/ACM 21ST INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2024, : 647 - 651
  • [2] KGTorrent: A Dataset of Python']Python Jupyter Notebooks from Kaggle
    Quaranta, Luigi
    Calefato, Fabio
    Lanubile, Filippo
    [J]. 2021 IEEE/ACM 18TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2021), 2021, : 550 - 554
  • [3] Interactive Data Visualization in Jupyter Notebooks
    Piazentin Ono, Jorge
    Freire, Juliana
    Silva, Claudio T.
    [J]. COMPUTING IN SCIENCE & ENGINEERING, 2021, 23 (02) : 99 - 106
  • [4] Static Analysis of Data Transformations in Jupyter Notebooks
    Negrini, Luca
    Shabadi, Guruprerana
    Urban, Caterina
    [J]. PROCEEDINGS OF THE 12TH ACM SIGPLAN INTERNATIONAL WORKSHOP ON THE STATE OF THE ART IN PROGRAM ANALYSIS, SOAP 2023, 2023, : 8 - 13
  • [5] Biotechnology Data Analysis Training with Jupyter Notebooks
    Liebal, Ulf W.
    Schimassek, Rafael
    Broderius, Iris
    Maassen, Nicole
    Vogelgesang, Alina
    Weyers, Philipp
    Blank, Lars M.
    [J]. JOURNAL OF MICROBIOLOGY & BIOLOGY EDUCATION, 2023, 24 (01)
  • [6] Restoring Reproducibility of Jupyter Notebooks
    Wang, Jiawei
    Kuo, Tzu-yang
    Li, Li
    Zeller, Andreas
    [J]. 2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2020), 2020, : 288 - 289
  • [7] Literate programming with CCTBX and PyMOL in Jupyter notebooks Computing & Data Management
    Mooers, Blaine
    [J]. ACTA CRYSTALLOGRAPHICA A-FOUNDATION AND ADVANCES, 2021, 77 : A188 - A188
  • [8] Visualizing protein big data using Python']Python and Jupyter notebooks
    Weiss, Charles J.
    [J]. BIOCHEMISTRY AND MOLECULAR BIOLOGY EDUCATION, 2022, 50 (05) : 431 - 436
  • [9] Jupyter Notebooks for Generous Archive Interfaces
    Wigham, Mari
    Melgar, Liliana
    Ordelman, Roeland
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 2766 - 2774
  • [10] Appyters: Turning Jupyter Notebooks into data-driven web apps
    Clarke, Daniel J. B.
    Jeon, Minji
    Stein, Daniel J.
    Moiseyev, Nicole
    Kropiwnicki, Eryk
    Dai, Charles
    Xie, Zhuorui
    Wojciechowicz, Megan L.
    Litz, Skylar
    Hom, Jason
    Evangelista, John Erol
    Goldman, Lucas
    Zhang, Serena
    Yoon, Christine
    Ahamed, Tahmid
    Bhuiyan, Samantha
    Cheng, Minxuan
    Karam, Julie
    Jagodnik, Kathleen M.
    Shu, Ingrid
    Lachmann, Alexander
    Ayling, Sam
    Jenkins, Sherry L.
    Ma'ayan, Avi
    [J]. PATTERNS, 2021, 2 (03):