Language independent Big-Data system for the prediction of user location on Twitter

被引:0
|
作者
Alonso-Lorenzo, Jaime [1 ]
Costa-Montenegro, Enrique [1 ]
Fernandez-Gavilanes, Milagros [1 ]
机构
[1] Univ Vigo, Telemat Engn Dept, Vigo, Spain
关键词
Big-data; Social networks; Twitter; User location; Natural Language Processing; Network theory;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Social media interactions have become increasingly important in today's world. A survey conducted in 2014 among adult Americans found that a majority of those surveyed use at least one social media site. Twitter, in particular, serves 310 million active users on a monthly basis, and thousands of tweets are published every second. The public nature of this data makes it a prime candidate for data mining. Twitter users publish 140-character long messages and have the ability to geo-tag these tweets using a variety of methods: GPS coordinates, IP geolocation and user-declared location. However, few users disclose their location, only between 1% and 3% of users provide location data, according to our empirical findings. In this article, we aim to aggregate information from different sources to provide an estimation on the location of any Twitter user. We use an hybrid approach, using techniques in the fields of Natural Language Processing and network theory. Tests have been conducted on two datasets, inferring the location of each individual user and then comparing it against the actual known location of users with geolocation information. The estimation error is the distance in kilometers between the estimation and the actual location. Furthermore, there is a comparison of the relative average error per country, to account for difference in country sizes. Our results improve those presented in different researches in the literature. Our research has as feature to be independent of the language used by the user, while most of works in the literature use just one language or a reduced set of languages. The article also showcases the evolution of our estimation approach and the impact that the modifications had on the results.
引用
收藏
页码:2437 / 2446
页数:10
相关论文
共 50 条
  • [1] Role of twitter user profile features in retweet prediction for big data streams
    Saurabh Sharma
    Vishal Gupta
    [J]. Multimedia Tools and Applications, 2022, 81 : 27309 - 27338
  • [2] Role of twitter user profile features in retweet prediction for big data streams
    Sharma, Saurabh
    Gupta, Vishal
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (19) : 27309 - 27338
  • [3] Big-data approaches to protein structure prediction
    Soeding, Johannes
    [J]. SCIENCE, 2017, 355 (6322) : 248 - 249
  • [4] Failure Analysis and Prediction for Big-Data Systems
    Rosa, Andrea
    Chen, Lydia Y.
    Binder, Walter
    [J]. IEEE TRANSACTIONS ON SERVICES COMPUTING, 2017, 10 (06) : 984 - 998
  • [5] An ensemble classification approach for prediction of user's next location based on Twitter data
    Kumar, Sachin
    Nezhurina, Marina, I
    [J]. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2019, 10 (11) : 4503 - 4513
  • [6] An ensemble classification approach for prediction of user’s next location based on Twitter data
    Sachin Kumar
    Marina I. Nezhurina
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2019, 10 : 4503 - 4513
  • [7] BBS: A Blockchain Big-Data Sharing System
    Wang, Shan
    Yang, Ming
    Ge, Tingjian
    Luo, Yan
    Fu, Xinwen
    [J]. IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2022), 2022, : 4205 - 4210
  • [8] Model oriented system design on big-data
    Kushiro, Noriyuki
    Matsuda, Shodai
    Takahara, Kunio
    [J]. KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS 18TH ANNUAL CONFERENCE, KES-2014, 2014, 35 : 961 - 968
  • [9] Inference Models for Twitter User's Home Location Prediction
    Elmongui, Hicham G.
    Morsy, Hader
    Mansour, Riham
    [J]. 2015 IEEE/ACS 12TH INTERNATIONAL CONFERENCE OF COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2015,
  • [10] A Hierarchical Location Prediction Neural Network for Twitter User Geolocation
    Huang, Binxuan
    Carley, Kathleen M.
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 4732 - 4742