A Python']Python library for exploratory data analysis on twitter data based on tokens and aggregated origin-destination information

被引:3
|
作者
Graff, Mario [1 ,3 ,4 ]
Moctezuma, Daniela [2 ,3 ]
Miranda-Jimenez, Sabino [1 ,3 ]
Tellez, Eric S. [1 ,3 ]
机构
[1] INFOTEC Ctr Invest & Innovac Tecnol Informac & Co, Circuito Tecnopolo 112,Fracc Tecnopolo Pocitos 2, Aguascalientes 20313, Aguascalientes, Mexico
[2] CentroGEO Ctr Invest Ciencias Informac Geoespacia, Circuito Tecnopolo Norte 117, Aguascalientes 20313, Aguascalientes, Mexico
[3] CONACyT Consejo Nacl Ciencia & Tecnol, Direcc Catedras, Insurgentes Sur 1582, Mexico City 03940, DF, Mexico
[4] Colgate Univ, Dept Comp Sci, 13 Oak Dr, Hamilton, NY 13346 USA
关键词
Twitter exploratory analysis; Mobility patterns; Open-source [!text type='Python']Python[!/text] library;
D O I
10.1016/j.cageo.2021.105012
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Twitter is perhaps the social media more amenable for research. It requires only a few steps to obtain information, and there are plenty of libraries that can help in this regard. Nonetheless, knowing whether a particular event is expressed on Twitter is a challenging task that requires a considerable collection of tweets. This proposal aims to facilitate, to a researcher interested, the process of mining events on Twitter by opening a collection of processed information taken from Twitter since December 2015. The events could be related to natural disasters, health issues, and people's mobility, among other studies that can be pursued with the library proposed. Different applications are presented in this contribution to illustrate the library's capabilities: an exploratory analysis of the topics discovered in tweets, a study on similarity among dialects of the Spanish language, and a mobility report on different countries. In summary, the Python library presented is applied to different domains and retrieves a plethora of information in terms of frequencies by day of words and bi-grams of words for Arabic, English, Spanish, and Russian languages. As well as mobility information related to the number of travels among locations for more than 200 countries or territories.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Immunopeptidomics toolkit library (IPTK): a python']python-based modular toolbox for analyzing immunopeptidomics data
    ElAbd, Hesham
    Degenhardt, Frauke
    Koudelka, Tomas
    Kamps, Ann-Kristin
    Tholey, Andreas
    Bacher, Petra
    Lenz, Tobias L.
    Franke, Andre
    Wendorff, Mareike
    BMC BIOINFORMATICS, 2021, 22 (01)
  • [32] Scedar: A scalable Python']Python package for single-cell RNA-seq exploratory data analysis
    Zhang, Yuanchao
    Kim, Man S.
    Reichenberger, Erin R.
    Stear, Ben
    Taylor, Deanne M.
    PLOS COMPUTATIONAL BIOLOGY, 2020, 16 (04)
  • [33] pyEIA: A Python']Python-based framework for data analysis of electrochemical methods for immunoassays
    Vishart, Jonas Lynge
    Castillo-Leon, Jaime
    Svendsen, Winnie E.
    SOFTWAREX, 2021, 15
  • [34] A Python']Python package based on robust statistical analysis for serial crystallography data processing
    Hadian-Jazi, Marjan
    Sadri, Alireza
    ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY, 2023, 79 : 820 - 829
  • [35] Clinical analysis of influenza chills and fever symptoms based on Python']Python data mining
    Wang, Shiyu
    Wei, Xueqin
    Zhang, Yunchao
    Tang, Yao
    WIENER KLINISCHE WOCHENSCHRIFT, 2023, 135 : S801 - S801
  • [36] Clinical analysis of influenza chills and fever symptoms based on Python']Python data mining
    Zhu, Zhigang
    Zhang, Cheng
    WIENER KLINISCHE WOCHENSCHRIFT, 2023, 135 : S801 - S802
  • [37] NeoAnalysis: a Python']Python-based toolbox for quick electrophysiological data processing and analysis
    Zhang, Bo
    Dai, Ji
    Zhang, Tao
    BIOMEDICAL ENGINEERING ONLINE, 2017, 16
  • [38] A Python library for probabilistic analysis of single-cell omics data
    Adam Gayoso
    Romain Lopez
    Galen Xing
    Pierre Boyeau
    Valeh Valiollah Pour Amiri
    Justin Hong
    Katherine Wu
    Michael Jayasuriya
    Edouard Mehlman
    Maxime Langevin
    Yining Liu
    Jules Samaran
    Gabriel Misrachi
    Achille Nazaret
    Oscar Clivio
    Chenling Xu
    Tal Ashuach
    Mariano Gabitto
    Mohammad Lotfollahi
    Valentine Svensson
    Eduardo da Veiga Beltrame
    Vitalii Kleshchevnikov
    Carlos Talavera-López
    Lior Pachter
    Fabian J. Theis
    Aaron Streets
    Michael I. Jordan
    Jeffrey Regier
    Nir Yosef
    Nature Biotechnology, 2022, 40 : 163 - 166
  • [39] rstoolbox - a Python']Python library for large-scale analysis of computational protein design data and structural bioinformatics
    Bonet, Jaume
    Harteveld, Zander
    Sesterhenn, Fabian
    Scheck, Andreas
    Correia, Bruno E.
    BMC BIOINFORMATICS, 2019, 20 (1)
  • [40] Deriving fine-scale models of human mobility from aggregated origin-destination flow data
    Ciavarella C.
    Ferguson N.M.
    PLoS Computational Biology, 2021, 17 (02):