A framework for intelligent Twitter data analysis with non-negative matrix factorization

被引:16
|
作者
Casalino, Gabriella [1 ]
Castiello, Ciro [1 ]
Del Buono, Nicoletta [1 ]
Mencar, Corrado [1 ]
机构
[1] Univ Bari Aldo Moro, INDAM Res Grp GNCS, Bari, Italy
关键词
Clustering; Intelligent data analysis; Non-negative matrix factorization; Topic extraction; Twitter data;
D O I
10.1108/IJWIS-11-2017-0081
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Purpose The purpose of this paper is to propose a framework for intelligent analysis of Twitter data. The purpose of the framework is to allow users to explore a collection of tweets by extracting topics with semantic relevance. In this way, it is possible to detect groups of tweets related to new technologies, events and other topics that are automatically discovered. Design/methodology/approach The framework is based on a three-stage process. The first stage is devoted to dataset creation by transforming a collection of tweets in a dataset according to the vector space model. The second stage, which is the core of the framework, is centered on the use of non-negative matrix factorizations (NMF) for extracting human-interpretable topics from tweets that are eventually clustered. The number of topics can be user-defined or can be discovered automatically by applying subtractive clustering as a preliminary step before factorization. Cluster analysis and word-cloud visualization are used in the last stage to enable intelligent data analysis. Findings The authors applied the framework to a case study of three collections of Italian tweets both with manual and automatic selection of the number of topics. Given the high sparsity of Twitter data, the authors also investigated the influence of different initializations mechanisms for NMF on the factorization results. Numerical comparisons confirm that NMF could be used for clustering as it is comparable to classical clustering techniques such as spherical k-means. Visual inspection of the word-clouds allowed a qualitative assessment of the results that confirmed the expected outcomes. Originality/value The proposed framework enables a collaborative approach between users and computers for an intelligent analysis of Twitter data. Users are faced with interpretable descriptions of tweet clusters, which can be interactively refined with few adjustable parameters. The resulting clusters can be used for intelligent selection of tweets, as well as for further analytics concerning the impact of products, events, etc. in the social network.
引用
收藏
页码:334 / 356
页数:23
相关论文
共 50 条
  • [1] A Framework for Regularized Non-Negative Matrix Factorization, with Application to the Analysis of Gene Expression Data
    Taslaman, Leo
    Nilsson, Bjorn
    [J]. PLOS ONE, 2012, 7 (11):
  • [2] Imaging data analysis using non-negative matrix factorization
    Aonishi, Toru
    Maruyama, Ryoichi
    Ito, Tsubasa
    Miyakawa, Hiroyoshi
    Murayama, Masanori
    Ota, Keisuke
    [J]. NEUROSCIENCE RESEARCH, 2022, 179 : 51 - 56
  • [3] Non-negative Matrix Factorization for Binary Data
    Larsen, Jacob Sogaard
    Clemmensen, Line Katrine Harder
    [J]. 2015 7TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (IC3K), 2015, : 555 - 563
  • [4] Non-negative matrix factorization framework for face recognition
    Wang, Y
    Jia, YD
    Hu, CB
    Turk, M
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2005, 19 (04) : 495 - 511
  • [5] Joint Non-negative Matrix Factorization for Learning Ideological Leaning on Twitter
    Lahoti, Preethi
    Garimella, Kiran
    Gionis, Aristides
    [J]. WSDM'18: PROCEEDINGS OF THE ELEVENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2018, : 351 - 359
  • [6] Inferential, robust non-negative matrix factorization analysis of microarray data
    Fogel, Paul
    Young, S. Stanley
    Hawkins, Douglas M.
    Ledirac, Nathalie
    [J]. BIOINFORMATICS, 2007, 23 (01) : 44 - 49
  • [7] Application of non-negative matrix factorization to multispectral FLIM data analysis
    Pande, Paritosh
    Applegate, Brian E.
    Jo, Javier A.
    [J]. BIOMEDICAL OPTICS EXPRESS, 2012, 3 (09): : 2244 - 2262
  • [8] Performance Analysis of Non-negative Matrix Factorization Methods on TCGA Data
    Hou, Mi-Xiao
    Liu, Jin-Xing
    Shang, Junliang
    Gao, Ying-Lian
    Kong, Xiang-Zhen
    Dai, Ling-Yun
    [J]. INTELLIGENT COMPUTING THEORIES AND APPLICATION, PT II, 2018, 10955 : 407 - 418
  • [9] Non-negative matrix Factorization for Toxicogenomic Literature Data
    Kang, Byeong-Chul
    Kim, Hyung-Yong
    Lee, Tae-ho
    Shin, Ga-Hee
    Youn, Seok-Joo
    [J]. MOLECULAR & CELLULAR TOXICOLOGY, 2009, 5 (03) : 89 - 89