TunTap: A Tunisian Dataset for Topic and Polarity Extraction in Social Media

被引:0
|
作者
Djebbi, Mohamed Amine [1 ,2 ]
Ouersighni, Riadh [1 ,2 ]
机构
[1] Sci & Technol Def LR19DN01 STD, La Marsa, Tunisia
[2] CRM Mil Res Ctr, 2045 Taieb Mhiri St, Elaouina, Tunisia
来源
COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2022 | 2022年 / 13501卷
关键词
Text-mining; Natural language processing; Social media computing; Machine learning; BERT model;
D O I
10.1007/978-3-031-16014-1_40
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The massive usage of social networks has recently opened up new research avenues in the fields of data mining and decision-making. One of the most relevant forms of data generated by users in social media is an unstructured text that identifies their emotions on a given topic. Analyzing this new form of writing to extract valuable information is a challenging task, and could be of great interest in several fields such as healthcare, business intelligence, marketing strategies, ... to name but a few. This article considers topic and polarity extraction in application to Online Social Media (OSM) analysis, in the benefit of numerous domain applications. Implementing sentiment analysis and topic extraction algorithms for the purpose of detecting the polarity of a given comment towards a certain topic requires a sophisticated machine and deep learning supervised models and, at the same time, collecting, preparing and annotating a huge amount of data to train those models. In this paper, we propose a special dataset that can be used to extract both topic and polarity features from dialectical messages used in Tunisian daily electronic writing across the most popular OSM networks. We collected our data by crawling posts and comments' text from Facebook, Twitter and YouTube using related network graph API. In this work, we describe the whole pipeline used to prepare our corpus as well as the several extensive experiments setup and results conducted to evaluate the generated dataset. Up to our knowledge, the proposed multivariate Arabic dataset (Topic and Polarity) of Tunisian dialect is a first-time introduced in the NLP community up to now, and we made it publicly available on GitHub (https://github.com/DescoveryAmine/TunTap).
引用
收藏
页码:507 / 519
页数:13
相关论文
共 50 条
  • [21] Tunisian Dialect Resources for Opinion Analysis on Social Media
    Fsih, Emna
    Boujelbane, Rahma
    Belguith, Lamia Hadrich
    2018 JCCO JOINT INTERNATIONAL CONFERENCE ON ICT IN EDUCATION AND TRAINING, INTERNATIONAL CONFERENCE ON COMPUTING IN ARABIC, AND INTERNATIONAL CONFERENCE ON GEOCOMPUTING (JCCO: TICET-ICCA-GECO), 2018, : 41 - 47
  • [22] Disaster Impacts Surveillance from Social Media with Topic Modeling and Feature Extraction:Case of Hurricane Harvey
    Volodymyr V.Mihunov
    Navid H.Jafari
    Kejin Wang
    Nina S.N.Lam
    Dylan Govender
    InternationalJournalofDisasterRiskScience, 2022, 13 (05) : 729 - 742
  • [23] A deep multiple-instance text binary classification for topic relevant content extraction on social media
    Yin, Juan
    Liu, Xiaoyang
    Yang, Zhewen
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (01)
  • [24] The Social Media Genome: Modeling Individual Topic-Specific Behavior in Social Media
    Bogdanov, Petko
    Busch, Michael
    Moehlis, Jeff
    Singh, Ambuj K.
    Szymanski, Boleslaw K.
    2013 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2013, : 242 - 248
  • [25] A Dataset for Telling the Stories of Social Media Videos
    Gella, Spandana
    Lewis, Mike
    Rohrbach, Marcus
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 968 - 974
  • [26] A Dataset of Offensive Language in Kosovo Social Media
    Ajvazi, Adem
    Hardmeier, Christian
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1860 - 1869
  • [27] Topic Participation Algorithm for Social Search Engine Based on Facebook Dataset
    Yao, Hao-Ren
    Ting, I-Hsien
    MULTIDISCIPLINARY SOCIAL NETWORKS RESEARCH, MISNC 2014, 2014, 473 : 158 - 170
  • [29] Topic Sketch: Real Time Bursty Topic Detection From Social Media
    Keshav, B.
    Rajeshwari, J.
    2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS), 2017, : 904 - 908
  • [30] Hashtag-based topic evolution in social media
    Alam, Md Hijbul
    Ryu, Woo-Jong
    Lee, SangKeun
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2017, 20 (06): : 1527 - 1549