Usage of the Term Big Data in Biomedical Publications: A Text Mining Approach

被引:0
|
作者
van Altena, Allard J. [1 ]
Moerland, Perry D. [1 ]
Zwinderman, Aeilko H. [1 ]
Delgado Olabarriaga, Silvia [1 ]
机构
[1] Univ Amsterdam, Amsterdam UMC, Dept Clin Epidemiol Biostat & Bioinformat, Meibergdreef 9, NL-1105 AZ Amsterdam, Netherlands
关键词
Big Data; Big Data Aspects; hype; biomedical literature; text mining; Lasso Regression;
D O I
10.3390/bdcc3010013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this study, we attempt to assess the value of the term Big Data when used by researchers in their publications. For this purpose, we systematically collected a corpus of biomedical publications that use and do not use the term Big Data. These documents were used as input to a machine learning classifier to determine how well they can be separated into two groups and to determine the most distinguishing classification features. We generated 100 classifiers that could correctly distinguish between Big Data and non-Big Data documents with an area under the Receiver Operating Characteristic (ROC) curve of 0.96. The differences between the two groups were characterized by terms specific to Big Data themes-such as 'computational', 'mining', and 'challenges'-and also by terms that indicate the research field, such as 'genomics'. The ROC curves when plotted for various time intervals showed no difference over time. We conclude that there is a detectable and stable difference between publications that use the term Big Data and those that do not. Furthermore, the use of the term Big Data within a publication seems to indicate a distinct type of research in the biomedical field. Therefore, we conclude that value can be attributed to the term Big Data when used in a publication and this value has not changed over time.
引用
下载
收藏
页码:1 / 12
页数:12
相关论文
共 50 条
  • [21] Sentiment analysis with text mining in contexts of big data
    Andrade C.S.
    Santos M.Y.
    1600, IGI Global (13): : 47 - 67
  • [22] Mining Biomedical Publications With The LAPPS Grid
    Ide, Nancy
    Suderman, Keith
    Kim, Jin-Dong
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2075 - 2081
  • [23] Mining of clinical and biomedical text and data: Editorial of the special issue
    Karsten, Helena
    Suominen, Hanna
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2009, 78 (12) : 786 - 787
  • [24] Alkemio: association of chemicals with biomedical topics by text and data mining
    Gijon-Correas, Jose A.
    Andrade-Navarro, Miguel A.
    Fontaine, Jean F.
    NUCLEIC ACIDS RESEARCH, 2014, 42 (W1) : W422 - W429
  • [25] TRANSPORT ANALYSIS APPROACH BASED ON BIG DATA AND TEXT MINING ANALYSIS FROM SOCIAL MEDIA
    Serna, Ainhoa
    Gasparovic, Slaven
    XIII CONFERENCE ON TRANSPORT ENGINEERING, CIT2018, 2018, 33 : 291 - 298
  • [26] Text mining the biomedical literature
    Pertsemlidis, A
    BIOPHYSICAL JOURNAL, 2002, 82 (01) : 168A - 168A
  • [27] When big data made the headlines: mining the text of big data coverage in the news media
    Haider, Murtaza
    Gandomi, Amir
    INTERNATIONAL JOURNAL OF SERVICES TECHNOLOGY AND MANAGEMENT, 2021, 27 (1-2) : 23 - 50
  • [28] Automatic Surveillance of Pandemics Using Big Data and Text Mining
    Alharbi, Abdullah
    Alosaimi, Wael
    Uddin, M. Irfan
    CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 68 (01): : 303 - 317
  • [29] Big Data Analytics, Text Mining and Modern English Language
    Saqib Alam
    Nianmin Yao
    Journal of Grid Computing, 2019, 17 : 357 - 366
  • [30] Knowledge Entity Extraction and Text Mining in the Era of Big Data
    Zhang, Chengzhi
    Mayr, Philipp
    Lu, Wei
    Zhang, Yi
    Data and Information Management, 2021, 5 (03): : 309 - 311