WASM: A Dataset for Hashtag Recommendation for Arabic Tweets

被引:0
|
作者
Al-Shaibani, Maged S. [1 ]
Luqman, Hamzah [1 ,2 ]
Al-Ghofaily, Abdulaziz S. [1 ]
Al-Najim, Abdullatif A. [1 ]
机构
[1] King Fahd Univ Petr & Minerals, Informat & Comp Sci Dept, Dhahran, Saudi Arabia
[2] SDAIA KFUPM Joint Res Ctr Artificial Intelligence, Dhahran 31261, Saudi Arabia
关键词
Hashtag Recommendation; Hashtag Generation; Tweets Classification; Arabic Tweets; Twitter; Hashtags;
D O I
10.1007/s13369-023-08567-1
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
As one of the largest microblogging websites in the world, Twitter generates a huge amount of information daily. The massive size of the generated data increases the difficulty for humans to follow and receive information relevant to their interests. Therefore, Twitter allows users to annotate and categorize their tweets using appropriate hashtags. However, finding an appropriate hashtag for a tweet is not always straightforward. Furthermore, many users violate the hashtag flow by posting irrelevant content to the hashtag topic. These problems increase the need for a hashtag recommendation and classification system. This topic has received considerable attention from researchers in some languages, such as English and Chinese. However, this problem has not yet been explored for the Arabic language owing to the lack of datasets. In this study, we bridge this gap by proposing WASM, an Arabic Twitter hashtag recommendation dataset consisting of more than 100,000 tweets annotated with 87 hashtags. The proposed dataset is subjected to several rounds of automatic and manual filtrations to ensure that it is suitable for tasks related to tweets and hashtags. Further, we propose three systems for hashtag recommendation and classification. Each of these systems approaches the task differently by considering it as classification, generation, and named entity recognition problems. The results obtained using these systems are promising and can be used to benchmark the WASM dataset. The data and code are available at https://github.com/Hamzah-Luqman/wasm.
引用
收藏
页码:12131 / 12145
页数:15
相关论文
共 50 条
  • [31] Research topics and trends of the hashtag recommendation domain
    Babak Amiri
    Ramin Karimianghadim
    Navid Yazdanjue
    Liaquat Hossain
    Scientometrics, 2021, 126 : 2689 - 2735
  • [32] Temporal Effects on Hashtag Reuse in Twitter: A Cognitive-Inspired Hashtag Recommendation Approach
    Kowald, Dominik
    Pujari, Subhash Chandra
    Lex, Elisabeth
    PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'17), 2017, : 1401 - 1410
  • [33] The pragmatic functions of emojis in Arabic tweets
    Alharbi, Amjad
    Mahzari, Mohammad
    FRONTIERS IN PSYCHOLOGY, 2023, 13
  • [34] Sarcasm Detection and Quantification in Arabic Tweets
    Talafha, Bashar
    Za'Ter, Muhy Eddin
    Suleiman, Samer
    Al-Ayyoub, Mahmoud
    Al-Kabi, Mohammed N.
    2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 1121 - 1125
  • [35] Twitter Trend Extraction: A Graph-based Approach for Tweet and Hashtag Ranking, Utilizing No-Hashtag Tweets
    Majdabadi, Zahra
    Sabeti, Behnam
    Golazizian, Preni
    Asli, Seyed Arad Ashrafi
    Momenzadeh, Omid
    Fahmi, Reza
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6213 - 6219
  • [36] Multi-labeled Dataset of Arabic COVID-19 Tweets for Topic-Based Sentiment Classifications
    Alderazi, Fatima Mustafa
    Algosaibi, Abdulelah Abdallah
    Alabdullatif, Mohammed Abdulrahman
    2022 IEEE CONFERENCE ON EVOLVING AND ADAPTIVE INTELLIGENT SYSTEMS (IEEE EAIS 2022), 2022,
  • [37] Emotional Tone Detection in Arabic Tweets
    Al-Khatib, Amr
    El-Beltagy, Samhaa R.
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2017, PT II, 2018, 10762 : 105 - 114
  • [38] Detecting Emotions in English and Arabic Tweets
    Ahmad, Tariq
    Ramsay, Allan
    Ahmed, Hanady
    INFORMATION, 2019, 10 (03)
  • [39] Clustering Arabic Tweets for Sentiment Analysis
    Abuaiadah, Diab
    Rajendran, Dileep
    Jarrar, Mustafa
    2017 IEEE/ACS 14TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2017, : 449 - 456
  • [40] Disambiguating False-Alarm Hashtag Usages in Tweets for Irony Detection
    Huang, Hen-Hsen
    Chen, Chiao-Chen
    Chen, Hsin-Hsi
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2018, : 771 - 777