SANAD: Single-label Arabic News Articles Dataset for automatic text categorization

被引:32
|
作者
Einea, Omar [1 ]
Elnagar, Ashraf [1 ]
Al Debsi, Ridhwan [1 ]
机构
[1] Univ Sharjah, Sharjah, U Arab Emirates
来源
DATA IN BRIEF | 2019年 / 25卷
关键词
Arabic; Natural language processing; News articles; Single-label text classification;
D O I
10.1016/j.dib.2019.104076
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Text Classification is one of the most popular Natural Language Processing (NLP) tasks. Text classification (aka categorization) is an active research topic in recent years. However, much less attention was directed towards this task in Arabic, due to the lack of rich representative resources for training an Arabic text classifier. Therefore, we introduce a large Single-labeled Arabic News Articles Dataset (SANAD) of textual data collected from three news portals. The dataset is a large one consisting of almost 200k articles distributed into seven categories that we offer to the research community on Arabic computational linguistics. We anticipate that this rich dataset would make a great aid for a variety of NLP tasks on Modern Standard Arabic (MSA) textual data, especially for single label text classification purposes. We present the data in raw form. SANAD is composed of three main datasets scraped from three news portals, which are AlKhaleej, AlArabiya, and Akhbarona. SANAD is made public and freely available at https://data. mendeley.com/datasets/57zpx667y9. (c) 2019 The Author(s). Published by Elsevier Inc.
引用
收藏
页数:5
相关论文
共 35 条
  • [1] A Moroccan News Articles Dataset (MNAD) For Arabic Text Categorization
    Jbene, Mourad
    Tigani, Small
    Saadane, Rachid
    Chehri, Abdellah
    [J]. 2021 INTERNATIONAL CONFERENCE ON DECISION AID SCIENCES AND APPLICATION (DASA), 2021,
  • [2] Automatic text categorization of news articles
    Amasyali, MF
    Yildirim, T
    [J]. PROCEEDINGS OF THE IEEE 12TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, 2004, : 224 - 226
  • [3] Improved Single-Label Text Categorization by Instance Filtration
    Khan, Kashif Ullah
    Qamar, Usman
    [J]. 2015 9TH INTERNATIONAL CONFERENCE ON COMPLEX, INTELLIGENT, AND SOFTWARE INTENSIVE SYSTEMS CISIS 2015, 2015, : 28 - 35
  • [4] Text Classifiers for Automatic Articles Categorization
    Westa, Mateusz
    Szymanski, Julian
    Krawczyk, Henryk
    [J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT II, 2012, 7268 : 196 - 204
  • [5] Semi-supervised Single-label Text Categorization using Centroid-based Classifiers
    Cardoso-Cachopo, Ana
    Oliveira, Arlindo L.
    [J]. APPLIED COMPUTING 2007, VOL 1 AND 2, 2007, : 844 - +
  • [6] Automatic Text Summarization of News Articles
    Sethi, Prakhar
    Sonawane, Sameer
    Khanwalker, Saumitra
    Keskar, R. B.
    [J]. 2017 INTERNATIONAL CONFERENCE ON BIG DATA, IOT AND DATA SCIENCE (BID), 2017, : 23 - 29
  • [7] Automatic categorization of Arabic articles based on their political orientation
    Abooraig, Raddad
    Al-Zu'bi, Shadi
    Kanan, Tarek
    Hawashin, Bilal
    Al Ayoub, Mahmoud
    Hmeidi, Ismail
    [J]. DIGITAL INVESTIGATION, 2018, 25 : 24 - 41
  • [8] An Evaluation of Automatic Text Summarization of News Articles: The Case of Three Online Arabic Text Summary Generators
    Alliheibi, Fahad M.
    Omar, Abdulfattah
    Al-Horais, Nasser
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (05) : 91 - 101
  • [9] Categorization of News Articles using Neural Text Categorizer
    Jo, Taeho
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3, 2009, : 19 - 22
  • [10] Automatic Arabic Text Categorization using Bayesian Learning
    Kadhim, Mahmood H.
    Omar, Nazlia
    [J]. 2012 7TH INTERNATIONAL CONFERENCE ON COMPUTING AND CONVERGENCE TECHNOLOGY (ICCCT2012), 2012, : 415 - 419