Using Reddit Data for Multi-label Text Classification of Twitter Users Interests

被引:0
|
作者
Fiallos, Angel [1 ]
Jimenes, Karina [2 ]
机构
[1] Escuela Super Politecn Litoral, Guayaquil, Ecuador
[2] Univ Amer, Quito, Ecuador
关键词
Twitter; Reddit; Word2Vec; text; classification; LDA; TD-IDF;
D O I
10.1109/icedeg.2019.8734365
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The automation process for inferring users' interest groups is a challenge task in social networks research and it has applications in marketing and recommendation systems. Manually labeling of documents is a difficult and an expensive task, but it is essential for training an automatic text classifier. Actually, there are several approaches where the problem is treated as a multi-label prediction task. In this work, a methodology is proposed to automatically categorize data by considering Reddit and Twitter data. First, a dataset of 42.100 publications belongs to popular forums site Reddit is collected to train a model with labeled data. Then, a dataset of tweets, an average of 100 tweets per user, from 1573 profiles is collected to predict users' topics of interest with the trained model. Finally, we were able to automatically categorize data with an average precision of 75.62%.
引用
收藏
页码:324 / 327
页数:4
相关论文
共 50 条
  • [41] Hierarchical Multi-label Classification of Text with Capsule Networks
    Aly, Rami
    Remus, Steffen
    Biemann, Chris
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 323 - 330
  • [42] Correlation Networks for Extreme Multi-label Text Classification
    Xun, Guangxu
    Jha, Kishlay
    Sun, Jianhui
    Zhang, Aidong
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 1074 - 1082
  • [43] Multi-label dataless text classification with topic modeling
    Daochen Zha
    Chenliang Li
    Knowledge and Information Systems, 2019, 61 : 137 - 160
  • [44] Hierarchical Multi-Label Classification of Social Text Streams
    Ren, Zhaochun
    Peetz, Maria-Hendrike
    Liang, Shangsong
    van Dolen, Willemijn
    de Rijke, Maarten
    SIGIR'14: PROCEEDINGS OF THE 37TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2014, : 213 - 222
  • [45] A NEW INPUT REPRESENTATION FOR MULTI-LABEL TEXT CLASSIFICATION
    Alfaro, Rodrigo
    Allende, Hector
    2011 INTERNATIONAL CONFERENCE ON INSTRUMENTATION, MEASUREMENT, CIRCUITS AND SYSTEMS (ICIMCS 2011), VOL 3: COMPUTER-AIDED DESIGN, MANUFACTURING AND MANAGEMENT, 2011, : 207 - 210
  • [46] Active Learning Strategies for Multi-Label Text Classification
    Esuli, Andrea
    Sebastiani, Fabrizio
    ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2009, 5478 : 102 - +
  • [47] Multi-label text classification with an ensemble feature space
    Tandon, Kushagri
    Chatterjee, Niladri
    Journal of Intelligent and Fuzzy Systems, 2022, 42 (05): : 4425 - 4436
  • [48] Multi-label dataless text classification with topic modeling
    Zha, Daochen
    Li, Chenliang
    KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 61 (01) : 137 - 160
  • [49] Effective Multi-Label Active Learning for Text Classification
    Yang, Bishan
    Sun, Jian-Tao
    Wang, Tengjiao
    Chen, Zheng
    KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2009, : 917 - 925
  • [50] Hierarchical Transfer Learning for Multi-label Text Classification
    Banerjee, Siddhartha
    Akkaya, Cem
    Perez-Sorrosal, Francisco
    Tsioutsiouliklis, Kostas
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 6295 - 6300