Academia versus social media: A psycho-linguistic analysis

被引:1
|
作者
Thin Nguyen [1 ]
Venkatesh, Svetha [1 ]
Dinh Phung [1 ]
机构
[1] Deakin Univ, Ctr Pattern Recognit & Data Analyt, Geelong, Vic, Australia
关键词
Social media analytics; Large-scale computing; Psycho-linguistics; Academia; Computer-mediated communication; WIKIPEDIA;
D O I
10.1016/j.jocs.2017.08.011
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Publication pressure has influenced the way scientists report their experimental results. Recently it has been found that scientific outcomes have been exaggerated or distorted (spin) to hopefully be published. Apart from investigating the content to look for spins, language styles has been proven to be the good traces. For example, the use of words in emotion lexicons has been used to interpret exaggeration and overstatement in academia. This work adapts a data-driven approach to explore a comprehensive set of psycho-linguistic features for a large corpus of PubMed papers published for the last four decades. The language features for other media - online encyclopedia (Wikipedia), online diaries (web-logs), online forums (Reddit), and micro-blogs (Twitter) - are also extracted. Several binary classifications are employed to discover linguistic predictors of scientific abstracts versus other media as well as strong predictors of scientific articles in different cohorts of impact factors and author affiliations. Trends of language styles expressed in scientific articles over the course of 40 years has also been discovered, providing the evolution of academic writing for the period of time. The study demonstrates advances in lightning-fast cluster computing on dealing with large scale data, consisting of 5.8 terabytes of data containing 3.6 billion records from all the media. The good performance of the advanced cluster computing framework suggests the potential of pattern recognition in data at scale. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:228 / 237
页数:10
相关论文
共 50 条