It's Not Just GitHub: Identifying Data and Software Sources Included in Publications

被引:2
|
作者
Escamilla, Emily [1 ]
Salsabil, Lamia [1 ]
Klein, Martin [2 ]
Wu, Jian [1 ]
Weigle, Michele C. [1 ]
Nelson, Michael L. [1 ]
机构
[1] Old Dominion Univ, Norfolk, VA USA
[2] Los Alamos Natl Lab, Los Alamos, NM 87544 USA
关键词
Web Archiving; GitHub; arXiv; Digital Preservation; Memento; Open Source Software;
D O I
10.1007/978-3-031-43849-3_17
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Paper publications are no longer the only form of research product. Due to recent initiatives by publication venues and funding institutions, open access datasets and software products are increasingly considered research products and URIs to these products are growing more prevalent in scholarly publications. However, as with all URIs, resources found on the live Web are not permanent. Archivists and institutions including Software Heritage, Internet Archive, and Zenodo are working to preserve data and software products as valuable parts of reproducibility, a cornerstone of scientific research. While some hosting platforms are well-known and can be identified with regular expressions, there are a vast number of smaller, more niche hosting platforms utilized by researchers to host their data and software. If it is not feasible to manually identify all hosting platforms used by researchers, how can we identify URIs to open-access data and software (OADS) to aid in their preservation? We used a hybrid classifier to classify URIs as OADS URIs and non-OADS URIs. We found that URIs to Git hosting platforms (GHPs) including GitHub, GitLab, SourceForge, and Bitbucket accounted for 33% of OADS URIs. Non-GHP OADS URIs are distributed across almost 50,000 unique hostnames. We determined that using a hybrid classifier allows for the identification of OADS URIs in less common hosting platforms which can benefit discoverability for preserving datasets and software products as research products for reproducibility.
引用
收藏
页码:195 / 206
页数:12
相关论文
共 50 条
  • [41] THE ACCURACY OF NETWORK META-ANALYSIS FEASIBILITY PREDICTIONS BASED ON DATA INCLUDED ONLY IN THE ABSTRACTS OF TRIAL PUBLICATIONS
    Rutherford, L.
    Ahdesmaki, O.
    Clarke, N.
    Martin, A.
    Witkowski, M.
    VALUE IN HEALTH, 2023, 26 (12) : S551 - S551
  • [42] DLR's virtuallab: Scientific software just a mouse click
    Ernst, T
    Rother, T
    Schreier, F
    Wauer, J
    Balzer, W
    COMPUTING IN SCIENCE & ENGINEERING, 2003, 5 (01) : 70 - +
  • [43] Identifying the sources of growth in Taiwan's manufacturing industry
    Chuang, YC
    JOURNAL OF DEVELOPMENT STUDIES, 1996, 32 (03): : 445 - 463
  • [44] Identifying landmark publications in the long run using field-normalized citation data
    Bornmann, Lutz
    Ye, Adam
    Ye, Fred
    JOURNAL OF DOCUMENTATION, 2018, 74 (02) : 278 - 288
  • [45] The Gaggle: An open-source software system for integrating bioinformatics software and data sources
    Shannon, Paul T.
    Reiss, David J.
    Bonneau, Richard
    Baliga, Nitin S.
    BMC BIOINFORMATICS, 2006, 7 (1)
  • [46] The Gaggle: An open-source software system for integrating bioinformatics software and data sources
    Paul T Shannon
    David J Reiss
    Richard Bonneau
    Nitin S Baliga
    BMC Bioinformatics, 7
  • [47] An Investigation into Inconsistency of Software Vulnerability Severity across Data Sources
    Croft, Roland
    Babar, M. Ali
    Li, Li
    2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2022), 2022, : 338 - 348
  • [48] SOFTWARE SOURCES .1. DIRECTORIES AND DATA-BASES
    BRAHAM, R
    MECHANICAL ENGINEERING, 1984, 106 (03) : 4 - &
  • [49] DOMAS: a data management software framework for advanced light sources
    Hu, Hao
    Lei, Lei
    Wang, Haofan
    Zhuang, Bo
    Zhang, Ruojin
    Luo, Qi
    Sun, Xiaokang
    Qi, Fazhi
    JOURNAL OF SYNCHROTRON RADIATION, 2024, 31 (Pt 2) : 312 - 321
  • [50] A SOFTWARE PLATFORM TO SYNTHESIZE EVIDENCE FROM HETEROGENEOUS DATA SOURCES
    Shum, K.
    Zheng, P.
    Dinh, T.
    Azimi, M.
    Inumpudi, A.
    VALUE IN HEALTH, 2014, 17 (03) : A189 - A189