TunTap: A Tunisian Dataset for Topic and Polarity Extraction in Social Media

被引:0
|
作者
Djebbi, Mohamed Amine [1 ,2 ]
Ouersighni, Riadh [1 ,2 ]
机构
[1] Sci & Technol Def LR19DN01 STD, La Marsa, Tunisia
[2] CRM Mil Res Ctr, 2045 Taieb Mhiri St, Elaouina, Tunisia
关键词
Text-mining; Natural language processing; Social media computing; Machine learning; BERT model;
D O I
10.1007/978-3-031-16014-1_40
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The massive usage of social networks has recently opened up new research avenues in the fields of data mining and decision-making. One of the most relevant forms of data generated by users in social media is an unstructured text that identifies their emotions on a given topic. Analyzing this new form of writing to extract valuable information is a challenging task, and could be of great interest in several fields such as healthcare, business intelligence, marketing strategies, ... to name but a few. This article considers topic and polarity extraction in application to Online Social Media (OSM) analysis, in the benefit of numerous domain applications. Implementing sentiment analysis and topic extraction algorithms for the purpose of detecting the polarity of a given comment towards a certain topic requires a sophisticated machine and deep learning supervised models and, at the same time, collecting, preparing and annotating a huge amount of data to train those models. In this paper, we propose a special dataset that can be used to extract both topic and polarity features from dialectical messages used in Tunisian daily electronic writing across the most popular OSM networks. We collected our data by crawling posts and comments' text from Facebook, Twitter and YouTube using related network graph API. In this work, we describe the whole pipeline used to prepare our corpus as well as the several extensive experiments setup and results conducted to evaluate the generated dataset. Up to our knowledge, the proposed multivariate Arabic dataset (Topic and Polarity) of Tunisian dialect is a first-time introduced in the NLP community up to now, and we made it publicly available on GitHub (https://github.com/DescoveryAmine/TunTap).
引用
收藏
页码:507 / 519
页数:13
相关论文
共 50 条
  • [1] Topic Extraction in Social Media
    Rafea, Ahmed
    Mostafa, Nada A.
    PROCEEDINGS OF THE 2013 INTERNATIONAL CONFERENCE ON COLLABORATION TECHNOLOGIES AND SYSTEMS (CTS), 2013, : 94 - 98
  • [2] A social and news media benchmark dataset for topic modeling
    Miles, Samuel
    Yao, Lixia
    Meng, Weilin
    Black, Christopher M.
    Ben-Miled, Zina
    DATA IN BRIEF, 2022, 43
  • [3] Slang feature extraction by analysing topic change on social media
    Matsumoto, Kazuyuki
    Ren, Fuji
    Matsuoka, Masaya
    Yoshida, Minoru
    Kita, Kenji
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2019, 4 (01) : 64 - 71
  • [4] Topic Extraction of Events on Social Media Using Reinforced Knowledge
    Zhang, Xuefei
    He, Ruifang
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2018, PT II, 2018, 11062 : 465 - 476
  • [5] TOPIC EXTRACTION IN SOCIAL NETWORKS
    Messaoudi, Chaima
    Guessoum, Zahia
    Benromdhane, Lotfi
    COMPUTING AND INFORMATICS, 2022, 41 (01) : 56 - 77
  • [6] Behavior Based Group Recommendation from Social Media Dataset by Using Deep Learning and Topic Modeling
    Mukta M.S.H.
    Ahmed J.
    Raiaan M.A.K.
    Fahad N.M.
    Islam M.N.
    Imtiaz N.
    Islam M.A.
    Ali M.E.
    Azam S.
    SN Computer Science, 5 (6)
  • [7] Pulse of the pandemic: Iterative topic filtering for clinical information extraction from social media
    Wu, Julia
    Sivaraman, Venkatesh
    Kumar, Dheekshita
    Banda, Juan M.
    Sontag, David
    JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 120
  • [8] Aggregated topic models for increasing social media topic coherence
    Blair, Stuart J.
    Bi, Yaxin
    Mulvenna, Maurice D.
    APPLIED INTELLIGENCE, 2020, 50 (01) : 138 - 156
  • [9] Aggregated topic models for increasing social media topic coherence
    Stuart J. Blair
    Yaxin Bi
    Maurice D. Mulvenna
    Applied Intelligence, 2020, 50 : 138 - 156
  • [10] Tunisian Revolution and Internet: The Role of Social Media
    Lecomte, Romain
    ANNEE DU MAGHREB, 2011, 7