Usage of the Term Big Data in Biomedical Publications: A Text Mining Approach

被引:0
|
作者
van Altena, Allard J. [1 ]
Moerland, Perry D. [1 ]
Zwinderman, Aeilko H. [1 ]
Delgado Olabarriaga, Silvia [1 ]
机构
[1] Univ Amsterdam, Amsterdam UMC, Dept Clin Epidemiol Biostat & Bioinformat, Meibergdreef 9, NL-1105 AZ Amsterdam, Netherlands
关键词
Big Data; Big Data Aspects; hype; biomedical literature; text mining; Lasso Regression;
D O I
10.3390/bdcc3010013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this study, we attempt to assess the value of the term Big Data when used by researchers in their publications. For this purpose, we systematically collected a corpus of biomedical publications that use and do not use the term Big Data. These documents were used as input to a machine learning classifier to determine how well they can be separated into two groups and to determine the most distinguishing classification features. We generated 100 classifiers that could correctly distinguish between Big Data and non-Big Data documents with an area under the Receiver Operating Characteristic (ROC) curve of 0.96. The differences between the two groups were characterized by terms specific to Big Data themes-such as 'computational', 'mining', and 'challenges'-and also by terms that indicate the research field, such as 'genomics'. The ROC curves when plotted for various time intervals showed no difference over time. We conclude that there is a detectable and stable difference between publications that use the term Big Data and those that do not. Furthermore, the use of the term Big Data within a publication seems to indicate a distinct type of research in the biomedical field. Therefore, we conclude that value can be attributed to the term Big Data when used in a publication and this value has not changed over time.
引用
下载
收藏
页码:1 / 12
页数:12
相关论文
共 50 条
  • [41] A novel data cleaning approach for Web usage mining
    Chen, X. (chenxs@scu.edu.cn), 1600, Sichuan University (46):
  • [42] Biomedical Hot Points Discovering With Data Mining Approach
    Xu, Dan
    Zhu, Fei
    2012 INTERNATIONAL SYMPOSIUM ON INFORMATION SCIENCE AND ENGINEERING (ISISE), 2012, : 279 - 281
  • [43] @Note: A workbench for Biomedical Text Mining
    Lourenco, Analia
    Carreira, Rafael
    Carneiro, Sonia
    Maia, Paulo
    Glez-Pena, Daniel
    Fdez-Riverola, Florentino
    Ferreira, Eugenio C.
    Rocha, Isabel
    Rocha, Miguel
    JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (04) : 710 - 720
  • [44] Biomedical Text Mining and Its Applications
    Rodriguez-Esteban, Raul
    PLOS COMPUTATIONAL BIOLOGY, 2009, 5 (12)
  • [45] New frontiers in biomedical text mining
    Zweigenbaum, Pierre
    Demner-Fushman, Dina
    Yu, Hong
    Cohen, K. Bretonnel
    Pacific Symposium on Biocomputing 2007, 2007, : 205 - 208
  • [46] Text mining patents for biomedical knowledge
    Rodriguez-Esteban, Raul
    Bundschus, Markus
    DRUG DISCOVERY TODAY, 2016, 21 (06) : 997 - 1002
  • [47] Application of text mining in the biomedical domain
    Fleuren, Wilco W. M.
    Alkema, Wynand
    METHODS, 2015, 74 : 97 - 106
  • [48] Text Mining Analysis in Turkish Language Using Big Data Tools
    Cakir, Mehmet Ulas
    Guldamlasioglu, Seren
    PROCEEDINGS 2016 IEEE 40TH ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE WORKSHOPS, VOL 1, 2016, : 614 - 618
  • [49] Text Mining for Big Data Analysis in Financial Sector: A Literature Review
    Bach, Mirjana Pejic
    Krstic, Zivko
    Seljan, Sanja
    Turulja, Lejla
    SUSTAINABILITY, 2019, 11 (05)
  • [50] Research trends on big data domain using text mining algorithms
    Jalali, Seyed Mohammad Jafar
    Park, Han Woo
    Vanani, Iman Raeesi
    Kim-Hung Pho
    DIGITAL SCHOLARSHIP IN THE HUMANITIES, 2021, 36 (02) : 361 - 370