Using Reddit Data for Multi-label Text Classification of Twitter Users Interests

被引:0
|
作者
Fiallos, Angel [1 ]
Jimenes, Karina [2 ]
机构
[1] Escuela Super Politecn Litoral, Guayaquil, Ecuador
[2] Univ Amer, Quito, Ecuador
关键词
Twitter; Reddit; Word2Vec; text; classification; LDA; TD-IDF;
D O I
10.1109/icedeg.2019.8734365
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The automation process for inferring users' interest groups is a challenge task in social networks research and it has applications in marketing and recommendation systems. Manually labeling of documents is a difficult and an expensive task, but it is essential for training an automatic text classifier. Actually, there are several approaches where the problem is treated as a multi-label prediction task. In this work, a methodology is proposed to automatically categorize data by considering Reddit and Twitter data. First, a dataset of 42.100 publications belongs to popular forums site Reddit is collected to train a model with labeled data. Then, a dataset of tweets, an average of 100 tweets per user, from 1573 profiles is collected to predict users' topics of interest with the trained model. Finally, we were able to automatically categorize data with an average precision of 75.62%.
引用
收藏
页码:324 / 327
页数:4
相关论文
共 50 条
  • [1] Multi-label Classification of Twitter Data Using Modified ML-KNN
    Srivastava, Saurabh Kumar
    Singh, Sandeep Kumar
    ADVANCES IN DATA AND INFORMATION SCIENCES, ICDIS 2017, VOL 2, 2019, 39 : 31 - 41
  • [2] A Combined Approach for Multi-Label Text Data Classification
    Strimaitis, Rokas
    Stefanovic, Pavel
    Ramanauskaite, Simona
    Slotkiene, Asta
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [3] Label prompt for multi-label text classification
    Song, Rui
    Liu, Zelong
    Chen, Xingbing
    An, Haining
    Zhang, Zhiqi
    Wang, Xiaoguang
    Xu, Hao
    APPLIED INTELLIGENCE, 2023, 53 (08) : 8761 - 8775
  • [4] Label prompt for multi-label text classification
    Rui Song
    Zelong Liu
    Xingbing Chen
    Haining An
    Zhiqi Zhang
    Xiaoguang Wang
    Hao Xu
    Applied Intelligence, 2023, 53 : 8761 - 8775
  • [5] Multi-label text classification using multinomial models
    Vilar, D
    Castro, MJ
    Sanchis, E
    ADVANCES IN NATURAL LANGUAGE PROCESSING, 2004, 3230 : 220 - 230
  • [6] Using Correlation Based Subspace Clustering For Multi-label Text Data Classification
    Ahmed, Mohammad Salim
    Khan, Latifur
    Rajeswari, Mandava
    22ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2010), PROCEEDINGS, VOL 2, 2010, : 296 - 303
  • [7] Analyzing Travel Behavior Using Multi-label Classification From Twitter
    Takahashi, Kazuki
    Kato, Daiju
    Endo, Masaki
    Araki, Tetsuya
    Hirota, Masaharu
    Ishikawa, Hiroshi
    9TH INTERNATIONAL CONFERENCE ON MANAGEMENT OF EMERGENT DIGITAL ECOSYSTEMS (MEDES 2017), 2017, : 50 - 56
  • [8] Multi-Label Classification of Text Documents Using Deep Learning
    Mohammed, Hamza Haruna
    Dogdu, Erdogan
    Gorur, Abdul Kadir
    Choupani, Roya
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 4681 - 4689
  • [9] LABEL-AWARE TEXT REPRESENTATION FOR MULTI-LABEL TEXT CLASSIFICATION
    Guo, Hao
    Li, Xiangyang
    Zhang, Lei
    Liu, Jia
    Chen, Wei
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7728 - 7732
  • [10] Metalearning Applied to Multi-label Text Classification
    dos Santos, Vania Batista
    de Campos Merschmann, Luiz Henrique
    PROCEEDINGS OF 16TH BRAZILIAN SYMPOSIUM ON INFORMATION SYSTEMS ON DIGITAL TRANSFORMATION AND INNOVATION, SBSI 2020, 2020,