Language independent Big-Data system for the prediction of user location on Twitter

被引:0
|
作者
Alonso-Lorenzo, Jaime [1 ]
Costa-Montenegro, Enrique [1 ]
Fernandez-Gavilanes, Milagros [1 ]
机构
[1] Univ Vigo, Telemat Engn Dept, Vigo, Spain
关键词
Big-data; Social networks; Twitter; User location; Natural Language Processing; Network theory;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Social media interactions have become increasingly important in today's world. A survey conducted in 2014 among adult Americans found that a majority of those surveyed use at least one social media site. Twitter, in particular, serves 310 million active users on a monthly basis, and thousands of tweets are published every second. The public nature of this data makes it a prime candidate for data mining. Twitter users publish 140-character long messages and have the ability to geo-tag these tweets using a variety of methods: GPS coordinates, IP geolocation and user-declared location. However, few users disclose their location, only between 1% and 3% of users provide location data, according to our empirical findings. In this article, we aim to aggregate information from different sources to provide an estimation on the location of any Twitter user. We use an hybrid approach, using techniques in the fields of Natural Language Processing and network theory. Tests have been conducted on two datasets, inferring the location of each individual user and then comparing it against the actual known location of users with geolocation information. The estimation error is the distance in kilometers between the estimation and the actual location. Furthermore, there is a comparison of the relative average error per country, to account for difference in country sizes. Our results improve those presented in different researches in the literature. Our research has as feature to be independent of the language used by the user, while most of works in the literature use just one language or a reduced set of languages. The article also showcases the evolution of our estimation approach and the impact that the modifications had on the results.
引用
收藏
页码:2437 / 2446
页数:10
相关论文
共 50 条
  • [31] A Location-Aware User Tracking and Prediction System
    Al Ridhawi, I.
    Aloqaily, M.
    Karmouch, A.
    Agoulmine, N.
    [J]. 2009 GLOBAL INFORMATION INFRASTRUCTURE SYMPOSIUM (GIIS 2009), 2009, : 193 - +
  • [32] Genomic Sequencing: Assessing The Health Care System, Policy, And Big-Data Implications
    Phillips, Kathryn A.
    Trosman, Julia R.
    Kelley, Robin K.
    Pletcher, Mark J.
    Douglas, Michael P.
    Weldon, Christine B.
    [J]. HEALTH AFFAIRS, 2014, 33 (07) : 1246 - 1253
  • [33] ExaAUAC: Arabic Twitter user age prediction corpus based on language and metadata features
    Sadeghi, Reyhaneh
    Akbari, Ahmad
    Jaziriyan, Mohammad Mehdi
    [J]. Discover Artificial Intelligence, 2024, 4 (01):
  • [34] BBS: A secure and autonomous blockchain-based big-data sharing system
    Wang, Shan
    Yang, Ming
    Jiang, Shan
    Chen, Fei
    Zhang, Yue
    Fu, Xinwen
    [J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2024, 150
  • [35] Exploring Bias in the US Electoral College System via Big-Data Simulation
    Breitzman, Anthony
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 4304 - 4312
  • [36] Development of classification model of power system fault by using PMU big-data
    Kang S.-B.
    Ko B.-K.
    Nam S.-C.
    Choi Y.-D.
    Kim Y.-H.
    Jeon D.-H.
    [J]. Transactions of the Korean Institute of Electrical Engineers, 2019, 68 (09): : 1079 - 1084
  • [37] Enhancing Location Prediction with Big Data: Evidence from Dhaka
    Matekenya, Dunstan
    Shibasaki, Ryosuke
    Ito, Masaki
    Sezaki, Kaoru
    [J]. UBICOMP'16 ADJUNCT: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING, 2016, : 753 - 762
  • [38] PageRank Pipeline Benchmark: Proposal for a Holistic System Benchmark for Big-Data Platforms
    Dreher, Patrick
    Byun, Chansup
    Hill, Chris
    Gadepally, Vijay
    Kuszmaul, Bradley
    Kepner, Jeremy
    [J]. 2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 929 - 937
  • [39] UPA: An Automated, Accurate and Efficient Differentially Private Big-data Mining System
    Li, Tsz On
    Jiang, Jianyu
    Qi, Ji
    So, Chi Chiu
    Ma, Jiacheng
    Chen, Xusheng
    Shen, Tianxiang
    Cui, Heming
    Wang, Yuexuan
    Wang, Peng
    [J]. 2020 50TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN 2020), 2020, : 515 - 527
  • [40] KAKUTE: A Precise, Unified Information Flow Analysis System for Big-data Security
    Jiang, Jianyu
    Zhao, Shixiong
    Alsayed, Danish
    Wang, Yuexuan
    Cui, Heming
    Liang, Feng
    Gu, Zhaoquan
    [J]. 33RD ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE (ACSAC 2017), 2017, : 79 - 90