SciSciNet: A large-scale open data lake for the science of science research

被引:0
|
作者
Zihang Lin
Yian Yin
Lu Liu
Dashun Wang
机构
[1] Northwestern University,Center for Science of Science and Innovation
[2] Northwestern University,Northwestern Institute on Complex Systems
[3] Northwestern University,Kellogg School of Management
[4] Fudan University,School of Computer Science
[5] Northwestern University,McCormick School of Engineering
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
The science of science has attracted growing research interests, partly due to the increasing availability of large-scale datasets capturing the innerworkings of science. These datasets, and the numerous linkages among them, enable researchers to ask a range of fascinating questions about how science works and where innovation occurs. Yet as datasets grow, it becomes increasingly difficult to track available sources and linkages across datasets. Here we present SciSciNet, a large-scale open data lake for the science of science research, covering over 134M scientific publications and millions of external linkages to funding and public uses. We offer detailed documentation of pre-processing steps and analytical choices in constructing the data lake. We further supplement the data lake by computing frequently used measures in the literature, illustrating how researchers may contribute collectively to enriching the data lake. Overall, this data lake serves as an initial but useful resource for the field, by lowering the barrier to entry, reducing duplication of efforts in data processing and measurements, improving the robustness and replicability of empirical claims, and broadening the diversity and representation of ideas in the field.
引用
收藏
相关论文
共 50 条
  • [1] SciSciNet: A large-scale open data lake for the science of science research
    Lin, Zihang
    Yin, Yian
    Liu, Lu
    Wang, Dashun
    [J]. SCIENTIFIC DATA, 2023, 10 (01)
  • [2] The prospects of open science practices and large-scale collaborations for dream research
    Ataei, Somayeh
    Dresler, Martin
    Schoch, Sarah F.
    [J]. SLEEP, 2023, 46 (12)
  • [3] Incentive or disincentive for research data disclosure? A large-scale empirical analysis and implications for open science policy
    Kwon, Seokbeom
    Motohashi, Kazuyuki
    [J]. INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT, 2021, 60
  • [4] Large-scale environmental data science with ExaGeoStatR
    Abdulah, Sameh
    Li, Yuxiao
    Cao, Jian
    Ltaief, Hatem
    Keyes, David E.
    Genton, Marc G.
    Sun, Ying
    [J]. ENVIRONMETRICS, 2023, 34 (01)
  • [5] Large-scale science
    Ted Agres
    [J]. Genome Biology, 4 (1):
  • [6] Computational and data Grids in large-scale science and engineering
    Johnston, WE
    [J]. FUTURE GENERATION COMPUTER SYSTEMS, 2002, 18 (08) : 1085 - 1100
  • [7] Large-scale Knowledge Representation Resources for Cognitive Science Research
    Miller, George A.
    Fillmore, Charles J.
    Palmer, Martha S.
    Lenat, Doug
    Hayes, Pat
    [J]. PROCEEDINGS OF THE TWENTY-SIXTH ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY, 2004, : 19 - 19
  • [8] The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
    Salatino, Angelo A.
    Thanapalasingam, Thiviyan
    Mannocci, Andrea
    Osborne, Francesco
    Motta, Enrico
    [J]. SEMANTIC WEB - ISWC 2018, PT II, 2018, 11137 : 187 - 205
  • [9] Large-scale Data Services for Science: Present and Future Challenges
    Lamanna, Massimo
    [J]. PHYSICS OF PARTICLES AND NUCLEI LETTERS, 2016, 13 (05) : 676 - 680
  • [10] Using computing and data grids for large-scale science and engineering
    Johnston, WE
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2001, 15 (03): : 223 - 242