TunTap: A Tunisian Dataset for Topic and Polarity Extraction in Social Media

被引：0

作者：

Djebbi, Mohamed Amine ^{[1
,2
]}

Ouersighni, Riadh ^{[1
,2
]}

机构：

[1] Sci & Technol Def LR19DN01 STD, La Marsa, Tunisia

[2] CRM Mil Res Ctr, 2045 Taieb Mhiri St, Elaouina, Tunisia

来源：

COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2022 | 2022年 / 13501卷

关键词：

Text-mining; Natural language processing; Social media computing; Machine learning; BERT model;

D O I：

10.1007/978-3-031-16014-1_40

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The massive usage of social networks has recently opened up new research avenues in the fields of data mining and decision-making. One of the most relevant forms of data generated by users in social media is an unstructured text that identifies their emotions on a given topic. Analyzing this new form of writing to extract valuable information is a challenging task, and could be of great interest in several fields such as healthcare, business intelligence, marketing strategies, ... to name but a few. This article considers topic and polarity extraction in application to Online Social Media (OSM) analysis, in the benefit of numerous domain applications. Implementing sentiment analysis and topic extraction algorithms for the purpose of detecting the polarity of a given comment towards a certain topic requires a sophisticated machine and deep learning supervised models and, at the same time, collecting, preparing and annotating a huge amount of data to train those models. In this paper, we propose a special dataset that can be used to extract both topic and polarity features from dialectical messages used in Tunisian daily electronic writing across the most popular OSM networks. We collected our data by crawling posts and comments' text from Facebook, Twitter and YouTube using related network graph API. In this work, we describe the whole pipeline used to prepare our corpus as well as the several extensive experiments setup and results conducted to evaluate the generated dataset. Up to our knowledge, the proposed multivariate Arabic dataset (Topic and Polarity) of Tunisian dialect is a first-time introduced in the NLP community up to now, and we made it publicly available on GitHub (https://github.com/DescoveryAmine/TunTap).

引用

页码：507 / 519

页数：13

共 50 条

[1] Topic Extraction in Social Media
Rafea, Ahmed
Mostafa, Nada A.
PROCEEDINGS OF THE 2013 INTERNATIONAL CONFERENCE ON COLLABORATION TECHNOLOGIES AND SYSTEMS (CTS), 2013, : 94 - 98
[2] A social and news media benchmark dataset for topic modeling
Miles, Samuel
Yao, Lixia
Meng, Weilin
Black, Christopher M.
Ben-Miled, Zina
DATA IN BRIEF, 2022, 43
[3] Slang feature extraction by analysing topic change on social media
Matsumoto, Kazuyuki
Ren, Fuji
Matsuoka, Masaya
Yoshida, Minoru
Kita, Kenji
CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2019, 4 (01) : 64 - 71
[4] Topic Extraction of Events on Social Media Using Reinforced Knowledge
Zhang, Xuefei
He, Ruifang
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2018, PT II, 2018, 11062 : 465 - 476
[5] TOPIC EXTRACTION IN SOCIAL NETWORKS
Messaoudi, Chaima
Guessoum, Zahia
Benromdhane, Lotfi
COMPUTING AND INFORMATICS, 2022, 41 (01) : 56 - 77
[6] Behavior Based Group Recommendation from Social Media Dataset by Using Deep Learning and Topic Modeling
Mukta M.S.H.
Ahmed J.
Raiaan M.A.K.
Fahad N.M.
Islam M.N.
Imtiaz N.
Islam M.A.
Ali M.E.
Azam S.
SN Computer Science, 5 (6)
[7] Pulse of the pandemic: Iterative topic filtering for clinical information extraction from social media
Wu, Julia
Sivaraman, Venkatesh
Kumar, Dheekshita
Banda, Juan M.
Sontag, David
JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 120
[8] Aggregated topic models for increasing social media topic coherence
Blair, Stuart J.
Bi, Yaxin
Mulvenna, Maurice D.
APPLIED INTELLIGENCE, 2020, 50 (01) : 138 - 156
[9] Aggregated topic models for increasing social media topic coherence
Stuart J. Blair
Yaxin Bi
Maurice D. Mulvenna
Applied Intelligence, 2020, 50 : 138 - 156
[10] Tunisian Revolution and Internet: The Role of Social Media
Lecomte, Romain
ANNEE DU MAGHREB, 2011, 7

← 1 2 3 4 5 →