Language independent Big-Data system for the prediction of user location on Twitter

被引:0
|
作者
Alonso-Lorenzo, Jaime [1 ]
Costa-Montenegro, Enrique [1 ]
Fernandez-Gavilanes, Milagros [1 ]
机构
[1] Univ Vigo, Telemat Engn Dept, Vigo, Spain
关键词
Big-data; Social networks; Twitter; User location; Natural Language Processing; Network theory;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Social media interactions have become increasingly important in today's world. A survey conducted in 2014 among adult Americans found that a majority of those surveyed use at least one social media site. Twitter, in particular, serves 310 million active users on a monthly basis, and thousands of tweets are published every second. The public nature of this data makes it a prime candidate for data mining. Twitter users publish 140-character long messages and have the ability to geo-tag these tweets using a variety of methods: GPS coordinates, IP geolocation and user-declared location. However, few users disclose their location, only between 1% and 3% of users provide location data, according to our empirical findings. In this article, we aim to aggregate information from different sources to provide an estimation on the location of any Twitter user. We use an hybrid approach, using techniques in the fields of Natural Language Processing and network theory. Tests have been conducted on two datasets, inferring the location of each individual user and then comparing it against the actual known location of users with geolocation information. The estimation error is the distance in kilometers between the estimation and the actual location. Furthermore, there is a comparison of the relative average error per country, to account for difference in country sizes. Our results improve those presented in different researches in the literature. Our research has as feature to be independent of the language used by the user, while most of works in the literature use just one language or a reduced set of languages. The article also showcases the evolution of our estimation approach and the impact that the modifications had on the results.
引用
收藏
页码:2437 / 2446
页数:10
相关论文
共 50 条
  • [41] PageRank Pipeline Benchmark: Proposal for a Holistic System Benchmark for Big-Data Platforms
    Dreher, Patrick
    Byun, Chansup
    Hill, Chris
    Gadepally, Vijay
    Kuszmaul, Bradley
    Kepner, Jeremy
    [J]. 2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 929 - 937
  • [42] Revealing the User Behavior Pattern Using HNCORS RTK Location Big Data
    Ao, Minsi
    Dong, Mingxu
    Chu, Bin
    Zeng, Xiangqiang
    Li, Chenxi
    [J]. IEEE ACCESS, 2019, 7 : 30302 - 30312
  • [43] A Big-Data based and process-oriented decision support system for traffic management
    Vera-Baquero, Alejandro
    Colomo-Palacios, Ricardo
    [J]. EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2018, 5 (17): : 1 - 13
  • [44] Big-Data X-ray Phase Contrast Imaging System Simulation Challenges
    Jimenez, Edward S.
    Dagel, Amber L.
    [J]. MEDICAL APPLICATIONS OF RADIATION DETECTORS V, 2015, 9594
  • [45] Elimination of HCV in a Large Urban Health System in the United States: A Big-data Approach
    Perumalswami, Ponni
    Wyatt, Brooke
    Harty, Alyson
    Mageras, Anna
    Li, Li
    Miller, Mark
    Vandromme, Maxence
    Dudley, Joel
    Dieterich, Douglas
    Branch, Andrea
    [J]. JOURNAL OF HEPATOLOGY, 2019, 70 (01) : E502 - E502
  • [46] Language about the future on social media as a novel marker of anxiety and depression: A big-data and experimental analysis
    Robertson, Cole
    Carney, James
    Trudell, Shane
    [J]. CURRENT RESEARCH IN BEHAVIORAL SCIENCES, 2023, 4
  • [47] A sustainable Ethereum merge-based Big-Data gathering and dissemination in IIoT System
    Sharma, Ravi
    Villanyi, Balazs
    [J]. ALEXANDRIA ENGINEERING JOURNAL, 2023, 69 : 109 - 119
  • [48] Caught between Professionalism and Populism: A Big-Data Analysis of the Lay Participation System in China
    Yu, Xiaohong
    Wang, Xiang
    [J]. CHINA REVIEW-AN INTERDISCIPLINARY JOURNAL ON GREATER CHINA, 2022, 22 (03): : 167 - 209
  • [49] Optimization assisted bidirectional gated recurrent unit for healthcare monitoring system in big-data
    Shukla, Prashant Kumar
    Stalin, Shalini
    Joshi, Shubham
    Shukla, Piyush Kumar
    Pareek, Piyush Kumar
    [J]. APPLIED SOFT COMPUTING, 2023, 138
  • [50] Impacts of COVID-19 pandemic on user behaviors and environmental benefits of bike sharing: A big-data analysis
    Shang, Wen-Long
    Chen, Jinyu
    Bi, Huibo
    Sui, Yi
    Chen, Yanyan
    Yu, Haitao
    [J]. APPLIED ENERGY, 2021, 285