Usage of the Term Big Data in Biomedical Publications: A Text Mining Approach

被引:0
|
作者
van Altena, Allard J. [1 ]
Moerland, Perry D. [1 ]
Zwinderman, Aeilko H. [1 ]
Delgado Olabarriaga, Silvia [1 ]
机构
[1] Univ Amsterdam, Amsterdam UMC, Dept Clin Epidemiol Biostat & Bioinformat, Meibergdreef 9, NL-1105 AZ Amsterdam, Netherlands
关键词
Big Data; Big Data Aspects; hype; biomedical literature; text mining; Lasso Regression;
D O I
10.3390/bdcc3010013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this study, we attempt to assess the value of the term Big Data when used by researchers in their publications. For this purpose, we systematically collected a corpus of biomedical publications that use and do not use the term Big Data. These documents were used as input to a machine learning classifier to determine how well they can be separated into two groups and to determine the most distinguishing classification features. We generated 100 classifiers that could correctly distinguish between Big Data and non-Big Data documents with an area under the Receiver Operating Characteristic (ROC) curve of 0.96. The differences between the two groups were characterized by terms specific to Big Data themes-such as 'computational', 'mining', and 'challenges'-and also by terms that indicate the research field, such as 'genomics'. The ROC curves when plotted for various time intervals showed no difference over time. We conclude that there is a detectable and stable difference between publications that use the term Big Data and those that do not. Furthermore, the use of the term Big Data within a publication seems to indicate a distinct type of research in the biomedical field. Therefore, we conclude that value can be attributed to the term Big Data when used in a publication and this value has not changed over time.
引用
下载
收藏
页码:1 / 12
页数:12
相关论文
共 50 条
  • [1] Analysis of the Term 'Big Data': Usage in Biomedical Publications
    van Altena, A. J.
    Moerland, P. D.
    Zwinderman, A. H.
    Olabarriaga, S. D.
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 1253 - 1258
  • [2] SparkText: Biomedical Text Mining on Big Data Framework
    Ye, Zhan
    Tafti, Ahmad P.
    He, Karen Y.
    Wang, Kai
    He, Max M.
    PLOS ONE, 2016, 11 (09):
  • [3] Oddpub – a text-mining algorithm to detect data sharing in biomedical publications
    Riedel, Nico
    Kip, Miriam
    Bobrov, Evgeny
    Data Science Journal, 2020, 19 (01): : 1 - 14
  • [4] Text Mining of Highly Cited Publications in Data Mining
    Jayasekara, P. K.
    Abu, K. S.
    IEEE 5TH INTERNATIONAL SYMPOSIUM ON EMERGING TRENDS AND TECHNOLOGIES IN LIBRARIES AND INFORMATION SERVICES (ETTLIS 2018), 2018, : 128 - 130
  • [5] TEXT AND DATA MINING FOR BIOMEDICAL DISCOVERY
    Gonzalez, Graciela
    Cohen, Kevin Bretonnel
    Leaman, Robert
    Greene, Casey S.
    Shah, Nigam
    Kann, Maricel G.
    Ye, Jieping
    PACIFIC SYMPOSIUM ON BIOCOMPUTING 2014, 2014, : 312 - 315
  • [6] Genescene: Biomedical text and data mining
    Leroy, G
    Chen, H
    Martinez, JD
    Eggers, S
    Falsey, RR
    Kislin, KL
    Huang, Z
    Li, JX
    Xu, J
    McDonald, DM
    Ng, G
    2003 JOINT CONFERENCE ON DIGITAL LIBRARIES, PROCEEDINGS, 2003, : 116 - 118
  • [7] Text Mining in Big Data Analytics
    Hassani, Hossein
    Beneki, Christina
    Unger, Stephan
    Mazinani, Maedeh Taj
    Yeganegi, Mohammad Reza
    BIG DATA AND COGNITIVE COMPUTING, 2020, 4 (01) : 1 - 34
  • [8] Text Mining in Big Data Analytics
    Cogburn, Derrick L.
    Hine, Michael J.
    Peladeau, Normand
    Yoon, Victoria Y.
    PROCEEDINGS OF THE 51ST ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS), 2018, : 584 - 586
  • [9] Text Mining in Big Data Analytics
    Cogburn, Derrick L.
    Hine, Michael J.
    Peladeau, Normand
    Yoon, Victoria Y.
    PROCEEDINGS OF THE 52ND ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, 2019, : 892 - 893
  • [10] Data Mining in Scientometrics: Usage Analysis for Academic Publications
    Mryglod, Olesya
    Holovatch, Yurij
    Kenna, Ralph
    2018 IEEE SECOND INTERNATIONAL CONFERENCE ON DATA STREAM MINING & PROCESSING (DSMP), 2018, : 241 - 246