SciSciNet: A large-scale open data lake for the science of science research

被引:23
|
作者
Lin, Zihang [1 ,2 ,3 ,4 ]
Yin, Yian [1 ,2 ,3 ,5 ]
Liu, Lu [1 ,2 ,3 ]
Wang, Dashun [1 ,2 ,3 ,5 ]
机构
[1] Northwestern Univ, Ctr Sci Sci & Innovat, Evanston, IL 60201 USA
[2] Northwestern Univ, Northwestern Inst Complex Syst, Evanston, IL 60201 USA
[3] Northwestern Univ, Kellogg Sch Management, Evanston, IL 60201 USA
[4] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
[5] Northwestern Univ, McCormick Sch Engn, Evanston, IL 60201 USA
基金
美国国家科学基金会;
关键词
GENDER-DIFFERENCES; KNOWLEDGE TRANSFER; SOCIAL-SCIENCE; IMPACT; DISTRIBUTIONS; PUBLICATIONS; PRODUCTIVITY; TECHNOLOGY; CITATIONS; LINKAGE;
D O I
10.1038/s41597-023-02198-9
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The science of science has attracted growing research interests, partly due to the increasing availability of large-scale datasets capturing the innerworkings of science. These datasets, and the numerous linkages among them, enable researchers to ask a range of fascinating questions about how science works and where innovation occurs. Yet as datasets grow, it becomes increasingly difficult to track available sources and linkages across datasets. Here we present SciSciNet, a large-scale open data lake for the science of science research, covering over 134M scientific publications and millions of external linkages to funding and public uses. We offer detailed documentation of pre-processing steps and analytical choices in constructing the data lake. We further supplement the data lake by computing frequently used measures in the literature, illustrating how researchers may contribute collectively to enriching the data lake. Overall, this data lake serves as an initial but useful resource for the field, by lowering the barrier to entry, reducing duplication of efforts in data processing and measurements, improving the robustness and replicability of empirical claims, and broadening the diversity and representation of ideas in the field.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] SciSciNet: A large-scale open data lake for the science of science research
    Zihang Lin
    Yian Yin
    Lu Liu
    Dashun Wang
    Scientific Data, 10
  • [2] The prospects of open science practices and large-scale collaborations for dream research
    Ataei, Somayeh
    Dresler, Martin
    Schoch, Sarah F.
    SLEEP, 2023, 46 (12)
  • [3] Incentive or disincentive for research data disclosure? A large-scale empirical analysis and implications for open science policy
    Kwon, Seokbeom
    Motohashi, Kazuyuki
    INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT, 2021, 60
  • [4] Large-scale environmental data science with ExaGeoStatR
    Abdulah, Sameh
    Li, Yuxiao
    Cao, Jian
    Ltaief, Hatem
    Keyes, David E.
    Genton, Marc G.
    Sun, Ying
    ENVIRONMETRICS, 2023, 34 (01)
  • [5] Large-scale science
    Ted Agres
    Genome Biology, 4 (1)
  • [6] Computational and data Grids in large-scale science and engineering
    Johnston, WE
    FUTURE GENERATION COMPUTER SYSTEMS, 2002, 18 (08) : 1085 - 1100
  • [7] Large-scale Knowledge Representation Resources for Cognitive Science Research
    Miller, George A.
    Fillmore, Charles J.
    Palmer, Martha S.
    Lenat, Doug
    Hayes, Pat
    PROCEEDINGS OF THE TWENTY-SIXTH ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY, 2004, : 19 - 19
  • [8] The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
    Salatino, Angelo A.
    Thanapalasingam, Thiviyan
    Mannocci, Andrea
    Osborne, Francesco
    Motta, Enrico
    SEMANTIC WEB - ISWC 2018, PT II, 2018, 11137 : 187 - 205
  • [9] Large-scale Data Services for Science: Present and Future Challenges
    Lamanna, Massimo
    PHYSICS OF PARTICLES AND NUCLEI LETTERS, 2016, 13 (05) : 676 - 680
  • [10] Using computing and data grids for large-scale science and engineering
    Johnston, WE
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2001, 15 (03): : 223 - 242