Analyzing the Quality of Twitter Data Streams

被引:0
|
作者
Franco Arolfo
Kevin Cortés Rodriguez
Alejandro Vaisman
机构
[1] Instituto Tecnológico de Buenos Aires Lavardén 315,Department of Information Engineering
来源
关键词
Data quality; Social networks; Twitter; Big data;
D O I
暂无
中图分类号
学科分类号
摘要
There is a general belief that the quality of Twitter data streams is generally low and unpredictable, making, in some way, unreliable to take decisions based on such data. The work presented here addresses this problem from a Data Quality (DQ) perspective, adapting the traditional methods used in relational databases, based on quality dimensions and metrics, to capture the characteristics of Twitter data streams in particular, and of Big Data in a more general sense. Therefore, as a first contribution, this paper re-defines the classic DQ dimensions and metrics for the scenario under study. Second, the paper introduces a software tool that allows capturing Twitter data streams in real time, computing their DQ and displaying the results through a wide variety of graphics. As a third contribution of this paper, using the aforementioned machinery, a thorough analysis of the DQ of Twitter streams is performed, based on four dimensions: Readability, Completeness, Usefulness, and Trustworthiness. These dimensions are studied for several different cases, namely unfiltered data streams, data streams filtered using a collection of keywords, and classifying tweets referring to different topics, studying the DQ for each topic. Further, although it is well known that the number of geolocalized tweets is very low, the paper studies the DQ of tweets with respect to the place from where they are posted. Last but not least, the tool allows changing the weights of each quality dimension considered in the computation of the overall data quality of a tweet. This allows defining weights that fit different analysis contexts and/or different user profiles. Interestingly, this study reveals that the quality of Twitter streams is higher than what would have been expected.
引用
收藏
页码:349 / 369
页数:20
相关论文
共 50 条
  • [21] #Circular economy - A Twitter Analytics framework analyzing Twitter data, drivers, practices, and sustainability outcomes
    De Lima, Felipe Alexandre
    JOURNAL OF CLEANER PRODUCTION, 2022, 372
  • [22] Analyzing Polemics Evolution from Twitter Streams Using Author-Based Social Networks
    Quirin, Arnaud
    Abascal-Mena, Rocio
    Sedes, Florence
    COMPUTACION Y SISTEMAS, 2018, 22 (01): : 35 - 45
  • [23] Real world city event extraction from Twitter data streams
    Zhou, Yuchao
    De, Suparna
    Moessner, Klaus
    7TH INTERNATIONAL CONFERENCE ON EMERGING UBIQUITOUS SYSTEMS AND PERVASIVE NETWORKS (EUSPN 2016)/THE 6TH INTERNATIONAL CONFERENCE ON CURRENT AND FUTURE TRENDS OF INFORMATION AND COMMUNICATION TECHNOLOGIES IN HEALTHCARE (ICTH-2016), 2016, 98 : 443 - 448
  • [24] Sentiment Drift Detection and Analysis in Real Time Twitter Data Streams
    Susi E.
    Shanthi A.P.
    Computer Systems Science and Engineering, 2023, 45 (03): : 3231 - 3246
  • [25] A Framework for Fast-Feedback Opinion Mining on Twitter Data Streams
    Selvan, Lokmanyathilak Govindan Sankar
    Moh, Teng-Sheng
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON COLLABORATION TECHNOLOGIES AND SYSTEMS, 2015, : 314 - 318
  • [26] Processing Big Trajectory and Twitter Data Streams using Apache STORM
    Stojanovic, Dragan
    Stojanovic, Natalija
    Turanjanin, Jovan
    2015 12TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS IN MODERN SATELLITE, CABLE AND BROADCASTING SERVICES (TELSIKS), 2015, : 301 - 304
  • [27] Analyzing and Improving Data Quality
    Buccella, Agustina
    Cechich, Alejandra
    Domingo, Gonzalo
    JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY, 2008, 8 (02): : 57 - 63
  • [28] Phishing Detection on Twitter Streams
    Jeong, Se Yeong
    Koh, Yun Sing
    Dobbie, Gillian
    TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING (PAKDD 2016), 2016, 9794 : 141 - 153
  • [29] IBM Streams Processing Language: Analyzing Big Data in motion
    Hirzel, M.
    Andrade, H.
    Gedik, B.
    Jacques-Silva, G.
    Khandekar, R.
    Kumar, V.
    Mendell, M.
    Nasgaard, H.
    Schneider, S.
    Soule, R.
    Wu, K. -L.
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2013, 57 (3-4)
  • [30] Prediction With Uncertainty: A Novel Framework for Analyzing Sensor Data Streams
    Rahman, Ashfaqur
    McCulloch, John
    Mamun, Quazi
    IEEE SENSORS JOURNAL, 2015, 15 (01) : 382 - 386