Big data techniques: Large-scale text analysis for scientific and journalistic research

被引:39
|
作者
Arcila-Calderon, Carlos [1 ]
Barbosa-Caro, Eduar [2 ]
Cabezuelo-Lorenzo, Francisco [3 ]
机构
[1] Univ Salamanca, Fac Ciencias Sociales, Campus Miguel Unamuno,Edificio FES, Salamanca 37071, Spain
[2] Univ Norte, Via Puerto Colombia,Km 5, Barranquilla, Colombia
[3] Univ Valladolid, Fac Ciencias Sociales Jurid & Comunicac, Plaza Univ 1, Segovia 40005, Colombia
来源
PROFESIONAL DE LA INFORMACION | 2016年 / 25卷 / 04期
关键词
Data; Big data; Data mining; Machine learning; Topic modeling; Sentiment analysis;
D O I
10.3145/epi.2016.jul.12
中图分类号
G2 [信息与知识传播];
学科分类号
05 ; 0503 ;
摘要
This paper conceptualizes the term big data and describes its relevance in social research and journalistic practices. We explain large-scale text analysis techniques such as automated content analysis, data mining, machine learning, topic modeling, and sentiment analysis, which may help scientific discovery in social sciences and news production in journalism. We explain the required e-infrastructure for big data analysis with the use of cloud computing and we asses the use of the main packages and libraries for information retrieval and analysis in commercial software and programming languages such as Python or R.
引用
收藏
页码:623 / 631
页数:9
相关论文
共 50 条
  • [1] Big Data, Large-Scale Text Analysis, and Public Health Research
    Chowkwanyun, Merlin
    [J]. AMERICAN JOURNAL OF PUBLIC HEALTH, 2019, 109 : 5126 - 5127
  • [2] A survey of the techniques of volume rendering for large-scale scientific data
    Wang, Huawei
    He, Liu
    Cao, Yi
    Xiao, Li
    [J]. Guofang Keji Daxue Xuebao/Journal of National University of Defense Technology, 2020, 42 (02): : 1 - 12
  • [3] Topic Modeling Techniques for Text Mining over a Large-Scale Scientific and Biomedical Text Corpus
    Avasthi, Sandhya
    Chauhan, Ritu
    Acharjya, Debi Prasanna
    [J]. International Journal of Ambient Computing and Intelligence, 2022, 13 (01)
  • [4] User Behavior Analysis and Research based on Big Data in Large-scale Gathering Scene
    Li, Mingxin
    Yin, Jinsong
    Tan, Juanjuan
    [J]. 2016 16TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES (ISCIT), 2016, : 362 - 366
  • [5] Characterizing in-text citations in scientific articles: A large-scale analysis
    Boyack, Kevin W.
    van Eck, Nees Jan
    Colavizza, Giovanni
    Waltman, Ludo
    [J]. JOURNAL OF INFORMETRICS, 2018, 12 (01) : 59 - 73
  • [6] Reference behavior in the full text of scientific articles: A large-scale analysis
    Boyack, Kevin W.
    van Eck, Nees Jan
    Colavizza, Giovanni
    Waltman, Ludo
    [J]. 16TH INTERNATIONAL CONFERENCE ON SCIENTOMETRICS & INFORMETRICS (ISSI 2017), 2017, : 787 - 798
  • [7] Building a Big Data Platform for Large-scale Security Data Analysis
    Lee, Jong-Hoon
    Kim, Young Soo
    Kim, Jong Hyun
    Kim, Ik Kyun
    Han, Ki-Jun
    [J]. 2017 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC), 2017, : 976 - 980
  • [8] Big-Data Analysis and Visualization as Research Methods for a Large-Scale Undergraduate Research Program at a Research University
    Killion, Patrick J.
    Page, Ian B.
    Yu, Victoria
    [J]. SPUR-SCHOLARSHIP AND PRACTICE OF UNDERGRADUATE RESEARCH, 2019, 2 (04): : 14 - 22
  • [9] Aggregation and Multidimensional Analysis of Big Data for Large-Scale Scientific Applications: Models, Issues, Analytics, and Beyond
    Cuzzocrea, Alfredo
    [J]. PROCEEDINGS OF THE 27TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, 2015,
  • [10] The Research on Automatic Construction Techniques of Large-scale Corpus for Chinese Text Categorization
    Hu, Yan
    Wu, Wei
    Miao, Miao
    [J]. IEEC 2009: FIRST INTERNATIONAL SYMPOSIUM ON INFORMATION ENGINEERING AND ELECTRONIC COMMERCE, PROCEEDINGS, 2009, : 640 - 645