KGTorrent: A Dataset of Python']Python Jupyter Notebooks from Kaggle

被引:21
|
作者
Quaranta, Luigi [1 ]
Calefato, Fabio [1 ]
Lanubile, Filippo [1 ]
机构
[1] Univ Bari, Bari, Italy
关键词
open dataset; repository; Kaggle; computational notebook; Jupyter;
D O I
10.1109/MSR52588.2021.00072
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Computational notebooks have become the tool of choice for many data scientists and practitioners for performing analyses and disseminating results. Despite their increasing popularity, the research community cannot yet count on a large, curated dataset of computational notebooks. In this paper, we fill this gap by introducing KGTORRENT, a dataset of Python Jupyter notebooks with rich metadata retrieved from Kaggle, a platform hosting data science competitions for learners and practitioners with any levels of expertise. We describe how we built KGTORRENT, and provide instructions on how to use it and refresh the collection to keep it up to date. Our vision is that the research community will use KGTORRENT to study how data scientists, especially practitioners, use Jupyter Notebook in the wild and identify potential shortcomings to inform the design of its future extensions.
引用
收藏
页码:550 / 554
页数:5
相关论文
共 50 条
  • [41] Studies onTrueperella pyogenesisolated from an okapi (Okapia johnstoni) and a royal python']python (Python']Python regius)
    Ahmed, Marwa F. E.
    Alssahen, Mazen
    Laemmler, Christoph
    Eisenberg, Tobias
    Ploetz, Madeleine
    Abdulmawjood, Amir
    [J]. BMC VETERINARY RESEARCH, 2020, 16 (01)
  • [42] Isolation of Mycoplasma agassizii-like agent from a Bali python']python (Python']Python regius)
    Gal Janos
    Udvari Lilla
    Farkas L Szilvia
    Ziszisz Arisz
    Marosan Miklos
    Mandoki Mira
    Kreizinger Zsuzsa
    Gyuranecz Miklos
    [J]. MAGYAR ALLATORVOSOK LAPJA, 2022, 144 (03) : 177 - 182
  • [43] A REOVIRUS FROM THE SNAKE PYTHON']PYTHON REGIUS
    AHNE, W
    THOMSEN, I
    WINTON, JR
    [J]. ZENTRALBLATT FUR BAKTERIOLOGIE MIKROBIOLOGIE UND HYGIENE SERIES A-MEDICAL MICROBIOLOGY INFECTIOUS DISEASES VIROLOGY PARASITOLOGY, 1988, 268 (01): : 131 - 131
  • [44] A Python library to check the level of anonymity of a dataset
    Judith Sáinz-Pardo Díaz
    Álvaro López García
    [J]. Scientific Data, 9
  • [45] "Literacy from Python']Python" Using Python']Python for a Proposed Cross-curricular Teaching and Learning Model
    Williams, Lawrence
    Mead, Beth
    [J]. DIGITAL TRANSFORMATION OF EDUCATION AND LEARNING - PAST, PRESENT AND FUTURE, OCCE 2021, 2022, 642 : 41 - 53
  • [46] The mitochondrial genome sequence analysis of Ophidascaris baylisi from the Burmese python']python (Python']Python molurus bivittatus)
    Zhao, Qi
    Abuzeid, Asmaa M., I
    He, Long
    Zhuang, Tingting
    Li, Xiu
    Liu, Jumei
    Zhu, Shilan
    Chen, Xiaoyu
    Li, Guoqing
    [J]. PARASITOLOGY INTERNATIONAL, 2021, 85
  • [47] Tachykinins (substance P, neurokinin A and neuropeptide γ) and neurotensin from the intestine of the Burmese python']python, Python']Python molurus
    Conlon, JM
    Adrian, TE
    Secor, SM
    [J]. PEPTIDES, 1997, 18 (10) : 1505 - 1510
  • [48] FIRST REPORT OF Paenibacillus cineris FROM A BURMESE PYTHON']PYTHON (Python']Python molurus bivittatus) WITH ORAL ABSCESS
    Staji, Hamid
    Tamai, Iradj Ashrafi
    Kafi, Zahra Zeifati
    [J]. SLOVENIAN VETERINARY RESEARCH, 2021, 58 (02) : 85 - 90
  • [49] PyTraceBugs: A Large Python']Python Code Dataset for Supervised Machine Learning in Software Defect Prediction
    Akimova, Elena N.
    Bersenev, Alexander Yu
    Deikov, Artem A.
    Kobylkin, Konstantin S.
    Konygin, Anton, V
    Mezentsev, Ilya P.
    Misilov, Vladimir E.
    [J]. 2021 28TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC 2021), 2021, : 141 - 151
  • [50] Review of the reticulated python']python (Python']Python reticulatus Schneider, 1801) with the description of new subspecies from Indonesia
    Auliya, M
    Mausfeld, P
    Schmitz, A
    Böhme, W
    [J]. NATURWISSENSCHAFTEN, 2002, 89 (05) : 201 - 213