TunTap: A Tunisian Dataset for Topic and Polarity Extraction in Social Media

被引:0
|
作者
Djebbi, Mohamed Amine [1 ,2 ]
Ouersighni, Riadh [1 ,2 ]
机构
[1] Sci & Technol Def LR19DN01 STD, La Marsa, Tunisia
[2] CRM Mil Res Ctr, 2045 Taieb Mhiri St, Elaouina, Tunisia
来源
COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2022 | 2022年 / 13501卷
关键词
Text-mining; Natural language processing; Social media computing; Machine learning; BERT model;
D O I
10.1007/978-3-031-16014-1_40
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The massive usage of social networks has recently opened up new research avenues in the fields of data mining and decision-making. One of the most relevant forms of data generated by users in social media is an unstructured text that identifies their emotions on a given topic. Analyzing this new form of writing to extract valuable information is a challenging task, and could be of great interest in several fields such as healthcare, business intelligence, marketing strategies, ... to name but a few. This article considers topic and polarity extraction in application to Online Social Media (OSM) analysis, in the benefit of numerous domain applications. Implementing sentiment analysis and topic extraction algorithms for the purpose of detecting the polarity of a given comment towards a certain topic requires a sophisticated machine and deep learning supervised models and, at the same time, collecting, preparing and annotating a huge amount of data to train those models. In this paper, we propose a special dataset that can be used to extract both topic and polarity features from dialectical messages used in Tunisian daily electronic writing across the most popular OSM networks. We collected our data by crawling posts and comments' text from Facebook, Twitter and YouTube using related network graph API. In this work, we describe the whole pipeline used to prepare our corpus as well as the several extensive experiments setup and results conducted to evaluate the generated dataset. Up to our knowledge, the proposed multivariate Arabic dataset (Topic and Polarity) of Tunisian dialect is a first-time introduced in the NLP community up to now, and we made it publicly available on GitHub (https://github.com/DescoveryAmine/TunTap).
引用
收藏
页码:507 / 519
页数:13
相关论文
共 50 条
  • [41] Dynamic Topic-Noise Models for Social Media
    Churchill, Rob
    Singh, Lisa
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT II, 2022, 13281 : 429 - 443
  • [42] Sentiment Mining within Social Media for Topic Identification
    Ostrowski, David Alfred
    2010 IEEE FOURTH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2010), 2010, : 394 - 401
  • [43] Hot Topic Analysis and Content Mining in Social Media
    Yu, Qian
    Weng, WeiTao
    Zhang, Kai
    Lei, Kai
    Xu, Kuai
    2014 IEEE INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2014,
  • [44] Exploiting Temporal Topic Models in Social Media Retrieval
    Tran, Tuan A.
    SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2012, : 999 - 999
  • [45] A Topic Detection and Visualisation System on Social Media Posts
    Andreadis, Stelios
    Gialampoukidis, Ilias
    Vrochidis, Stefanos
    Kompatsiaris, Ioannis
    INTERNET SCIENCE, 2017, 10673 : 421 - 427
  • [46] Topic Related Opinion Integration for Users of Social Media
    Xie, Songxian
    Tang, Jintao
    Wang, Ting
    SOCIAL MEDIA PROCESSING, 2014, 489 : 164 - 174
  • [47] Modeling Topic Evolution in Social Media Short Texts
    Zhang, Yuhao
    Mao, Wenji
    Lin, Junjie
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (IEEE ICBK 2017), 2017, : 315 - 319
  • [48] Compartmentalized Adaptive Topic Mining on Social Media Streams
    Nutakki, Gopi Chand
    Nasraoui, Olfa
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 992 - 997
  • [49] Topic Extraction from Messages in Social Computing Services Determining the number of Topic Clusters
    Chakraborty, Basabi
    Hashimoto, Takako
    2010 IEEE FOURTH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2010), 2010, : 232 - 235
  • [50] The social construction of the Tunisian revolutionary martyr in the media and popular perception
    DeGeorges, Thomas P.
    JOURNAL OF NORTH AFRICAN STUDIES, 2013, 18 (03): : 482 - 493