WASM: A Dataset for Hashtag Recommendation for Arabic Tweets

被引:0
|
作者
Al-Shaibani, Maged S. [1 ]
Luqman, Hamzah [1 ,2 ]
Al-Ghofaily, Abdulaziz S. [1 ]
Al-Najim, Abdullatif A. [1 ]
机构
[1] King Fahd Univ Petr & Minerals, Informat & Comp Sci Dept, Dhahran, Saudi Arabia
[2] SDAIA KFUPM Joint Res Ctr Artificial Intelligence, Dhahran 31261, Saudi Arabia
关键词
Hashtag Recommendation; Hashtag Generation; Tweets Classification; Arabic Tweets; Twitter; Hashtags;
D O I
10.1007/s13369-023-08567-1
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
As one of the largest microblogging websites in the world, Twitter generates a huge amount of information daily. The massive size of the generated data increases the difficulty for humans to follow and receive information relevant to their interests. Therefore, Twitter allows users to annotate and categorize their tweets using appropriate hashtags. However, finding an appropriate hashtag for a tweet is not always straightforward. Furthermore, many users violate the hashtag flow by posting irrelevant content to the hashtag topic. These problems increase the need for a hashtag recommendation and classification system. This topic has received considerable attention from researchers in some languages, such as English and Chinese. However, this problem has not yet been explored for the Arabic language owing to the lack of datasets. In this study, we bridge this gap by proposing WASM, an Arabic Twitter hashtag recommendation dataset consisting of more than 100,000 tweets annotated with 87 hashtags. The proposed dataset is subjected to several rounds of automatic and manual filtrations to ensure that it is suitable for tasks related to tweets and hashtags. Further, we propose three systems for hashtag recommendation and classification. Each of these systems approaches the task differently by considering it as classification, generation, and named entity recognition problems. The results obtained using these systems are promising and can be used to benchmark the WASM dataset. The data and code are available at https://github.com/Hamzah-Luqman/wasm.
引用
收藏
页码:12131 / 12145
页数:15
相关论文
共 50 条
  • [1] Hashtag Recommendation for Hyperlinked Tweets
    Sedhai, Surendra
    Sun, Aixin
    [J]. SIGIR'14: PROCEEDINGS OF THE 37TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2014, : 831 - 834
  • [2] Effect of Spam on Hashtag Recommendation for Tweets
    Sedhai, Surendra
    Sun, Aixin
    [J]. PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16 COMPANION), 2016, : 97 - 98
  • [3] Using Tweets Embeddings For Hashtag Recommendation in Twitter
    Ben-Lhachemi, Nada
    Nfaoui, El Habib
    [J]. PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING IN DATA SCIENCES (ICDS2017), 2018, 127 : 7 - 15
  • [4] Dataset of Arabic spam and ham tweets
    Kaddoura, Sanaa
    Henno, Safaa
    [J]. DATA IN BRIEF, 2024, 52
  • [5] DART: A Large Dataset of Dialectal Arabic Tweets
    Alsarsour, Israa
    Mohamed, Esraa
    Suwaileh, Reem
    Elsayed, Tamer
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3666 - 3670
  • [6] Using Tweets and Emojis to Build TEAD: an Arabic Dataset for Sentiment Analysis
    Abdellaoui, Houssem
    Zrigui, Mounir
    [J]. COMPUTACION Y SISTEMAS, 2018, 22 (03): : 777 - 786
  • [7] LASTD: A Manually Annotated and Tested Large Arabic Sentiment Tweets Dataset
    Elshakankery, Kariman
    Fayek, Magda
    Farouk, Mona
    [J]. 5TH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEM AND DATA MINING (ICISDM 2021), 2021, : 62 - 66
  • [8] HashSet - A Dataset For Hashtag Segmentation
    Kodali, Prashant
    Bhatnagar, Akshala
    Ahuja, Naman
    Shrivastava, Manish
    Kumaraguru, Ponnurangam
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7215 - 7219
  • [9] Hashtag Recommendation for Enterprise Applications
    Mahajan, Dhruv
    Kolathur, Vishwajit
    Bansal, Chetan
    Parthasarathy, Suresh
    Sundararajan, S.
    Keerthi, Sathiya
    Gehrke, Johannes
    [J]. CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 893 - 902
  • [10] Evolutionary Personalized Hashtag Recommendation
    Yu, Jianjun
    Shen, Yi
    [J]. WEB-AGE INFORMATION MANAGEMENT, WAIM 2014, 2014, 8485 : 34 - 37