Semi-supervised labeling: a proposed methodology for labeling the twitter datasets

被引:0
|
作者
Tabassum Gull Jan
Surinder Singh Khurana
Munish Kumar
机构
[1] Central University of Punjab,Department of Computer Science & Technology
[2] Maharaja Ranjit Singh Punjab Technical University,Department of Computational Sciences
来源
关键词
Twitter; Spam labeling; Clustering; Spam detection; Tweets;
D O I
暂无
中图分类号
学科分类号
摘要
Twitter has nowadays become a trending microblogging and social media platform for news and discussions. Since the dramatic increase in its platform has additionally set off a dramatic increase in spam utilization in this platform. For Supervised machine learning, one always finds a need to have a labeled dataset of Twitter. It is desirable to design a semi-supervised labeling technique for labeling newly prepared recent datasets. To prepare the labeled dataset lot of human affords are required. This issue has motivated us to propose an efficient approach for preparing labeled datasets so that time can be saved and human errors can be avoided. Our proposed approach relies on readily available features in real-time for better performance and wider applicability. This work aims at collecting the most recent tweets of a user using Twitter streaming and prepare a recent dataset of Twitter. Finally, a semi-supervised machine learning algorithm based on the self-training technique was designed for labeling the tweets. Semi-supervised support vector machine and semi-supervised decision tree classifiers were used as base classifiers in the self-training technique. Further, the authors have applied K means clustering algorithm to the tweets based on the tweet content. The principled novel approach is an ensemble of semi-supervised and unsupervised learning wherein it was found that semi-supervised algorithms are more accurate in prediction than unsupervised ones. To effectively assign the labels to the tweets, authors have implemented the concept of voting in this novel approach and the label pre-directed by the majority voting classifier is the actual label assigned to the tweet dataset. Maximum accuracy of 99.0% has been reported in this paper using a majority voting classifier for spam labeling.
引用
收藏
页码:7669 / 7683
页数:14
相关论文
共 50 条
  • [1] Semi-supervised labeling: a proposed methodology for labeling the twitter datasets
    Jan, Tabassum Gull
    Khurana, Surinder Singh
    Kumar, Munish
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (06) : 7669 - 7683
  • [2] Exploring Semi-Supervised Methods for Labeling Support in Multimodal Datasets
    Diete, Alexander
    Sztyler, Timo
    Stuckenschmidt, Heiner
    [J]. SENSORS, 2018, 18 (08)
  • [3] Semi-supervised Mesh Segmentation and Labeling
    Lv, Jiajun
    Chen, Xinlei
    Huang, Jin
    Bao, Hujun
    [J]. COMPUTER GRAPHICS FORUM, 2012, 31 (07) : 2241 - 2248
  • [4] Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning
    Cascante-Bonilla, Paola
    Tan, Fuwen
    Qi, Yanjun
    Ordonez, Vicente
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 6912 - 6920
  • [5] Semi-supervised Multitask Learning for Sequence Labeling
    Rei, Marek
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 2121 - 2130
  • [6] New Labeling Strategy for Semi-supervised Document Categorization
    Zhu, Yan
    Jing, Liping
    Yu, Jian
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, 2009, 5914 : 134 - 145
  • [7] Tracking with Context as a Semi-supervised Learning and Labeling Problem
    Cerman, Lukas
    Hlavac, Vaclav
    [J]. 2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 2124 - 2127
  • [8] Boosting semi-supervised learning with Contrastive Complementary Labeling
    Deng, Qinyi
    Guo, Yong
    Yang, Zhibang
    Pan, Haolin
    Chen, Jian
    [J]. NEURAL NETWORKS, 2024, 170 : 417 - 426
  • [9] Instance labeling in semi-supervised learning with meaning values of words
    Altinel, Berna
    Ganiz, Murat Can
    Diri, Banu
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2017, 62 : 152 - 163
  • [10] Unsupervised Selective Labeling for More Effective Semi-supervised Learning
    Wang, Xudong
    Lian, Long
    Yu, Stella X.
    [J]. COMPUTER VISION - ECCV 2022, PT XXX, 2022, 13690 : 427 - 445