Analyzing the Quality of Twitter Data Streams

被引:0
|
作者
Franco Arolfo
Kevin Cortés Rodriguez
Alejandro Vaisman
机构
[1] Instituto Tecnológico de Buenos Aires Lavardén 315,Department of Information Engineering
来源
关键词
Data quality; Social networks; Twitter; Big data;
D O I
暂无
中图分类号
学科分类号
摘要
There is a general belief that the quality of Twitter data streams is generally low and unpredictable, making, in some way, unreliable to take decisions based on such data. The work presented here addresses this problem from a Data Quality (DQ) perspective, adapting the traditional methods used in relational databases, based on quality dimensions and metrics, to capture the characteristics of Twitter data streams in particular, and of Big Data in a more general sense. Therefore, as a first contribution, this paper re-defines the classic DQ dimensions and metrics for the scenario under study. Second, the paper introduces a software tool that allows capturing Twitter data streams in real time, computing their DQ and displaying the results through a wide variety of graphics. As a third contribution of this paper, using the aforementioned machinery, a thorough analysis of the DQ of Twitter streams is performed, based on four dimensions: Readability, Completeness, Usefulness, and Trustworthiness. These dimensions are studied for several different cases, namely unfiltered data streams, data streams filtered using a collection of keywords, and classifying tweets referring to different topics, studying the DQ for each topic. Further, although it is well known that the number of geolocalized tweets is very low, the paper studies the DQ of tweets with respect to the place from where they are posted. Last but not least, the tool allows changing the weights of each quality dimension considered in the computation of the overall data quality of a tweet. This allows defining weights that fit different analysis contexts and/or different user profiles. Interestingly, this study reveals that the quality of Twitter streams is higher than what would have been expected.
引用
收藏
页码:349 / 369
页数:20
相关论文
共 50 条
  • [31] Strategy for Processing and Analyzing Social Media Data Streams in Emergencies
    Moi, Matthias
    Friberg, Therese
    Marterer, Robin
    Reuter, Christian
    Ludwig, Thomas
    2015 2ND INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES FOR DISASTER MANAGEMENT (ICT-DM), 2015, : 42 - 48
  • [32] Big Data Emerging Technologies: A CaseStudy with Analyzing Twitter Data using Apache Hive
    Bhardwaj, Aditya
    Vanraj
    Kumar, Ankit
    Narayan, Yogendra
    Kumar, Pawan
    2015 2ND INTERNATIONAL CONFERENCE ON RECENT ADVANCES IN ENGINEERING & COMPUTATIONAL SCIENCES (RAECS), 2015,
  • [33] Exploring Clustering Techniques for Analyzing User Engagement Patterns in Twitter Data
    Kanavos, Andreas
    Karamitsos, Ioannis
    Mohasseb, Alaa
    COMPUTERS, 2023, 12 (06)
  • [34] "Time for dabs": Analyzing Twitter data on marijuana concentrates across the US
    Daniulaityte, Raminta
    Nahhas, Ramzi W.
    Wijeratne, Sanjaya
    Carlson, Robert G.
    Lamy, Francois R.
    Martins, Silvia S.
    Boyer, Edward W.
    Smith, G. Alan
    Sheth, Amit
    DRUG AND ALCOHOL DEPENDENCE, 2015, 155 : 307 - 311
  • [35] Data mining framework for analyzing Twitter users' opinion on the drug mefloquine
    Elgohary, Esam M.
    Abd-Elaziz, Mohamed M.
    GAZZETTA MEDICA ITALIANA ARCHIVIO PER LE SCIENZE MEDICHE, 2021, 180 (05) : 166 - 171
  • [36] Analyzing tourist data on Twitter: a case study in the province of Granada at Spain
    Vinan-Ludena, Marlon Santiago
    de Campos, Luis M.
    JOURNAL OF HOSPITALITY AND TOURISM INSIGHTS, 2022, 5 (02) : 435 - 464
  • [37] Analyzing patients satisfaction level for medical services using twitter data
    Usman, Muhammad
    Mujahid, Muhammad
    Rustam, Furqan
    Flores, Emmanuel Soriano
    Mazon, Juan Luis Vidal
    Diez, Isabel de la Torre
    Ashraf, Imran
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [38] Role of twitter user profile features in retweet prediction for big data streams
    Saurabh Sharma
    Vishal Gupta
    Multimedia Tools and Applications, 2022, 81 : 27309 - 27338
  • [39] Role of twitter user profile features in retweet prediction for big data streams
    Sharma, Saurabh
    Gupta, Vishal
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (19) : 27309 - 27338
  • [40] Analyzing value streams
    Monroe, Dennis J.
    Quality, 2006, 45 (01): : 50 - 57