OCA: Opinion Corpus for Arabic

被引:154
|
作者
Rushdi-Saleh, Mohammed [1 ]
Teresa Martin-Valdivia, M. [1 ]
Alfonso Urena-Lopez, L. [1 ]
Perea-Ortega, Jose M. [1 ]
机构
[1] Univ Jaen, Dept Comp Sci, SINAI Res Grp, Jaen 23071, Spain
关键词
Data mining - Learning systems - Websites - Learning algorithms;
D O I
10.1002/asi.21598
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sentiment analysis is a challenging new task related to text mining and natural language processing. Although there are, at present, several studies related to this theme, most of these focus mainly on English texts. The resources available for opinion mining (OM) in other languages are still limited. In this article, we present a new Arabic corpus for the OM task that has been made available to the scientific community for research purposes. The corpus contains 500 movie reviews collected from different web pages and blogs in Arabic, 250 of them considered as positive reviews, and the other 250 as negative opinions. Furthermore, different experiments have been carried out on this corpus, using machine learning algorithms such as support vector machines and Naive Bayes. The results obtained are very promising and we are encouraged to continue this line of research.
引用
收藏
页码:2045 / 2054
页数:10
相关论文
共 50 条
  • [1] ARAACOM: ARAbic Algerian Corpus for Opinion Mining
    Rahab, Hichem
    Zitouni, Abdelhafid
    Djoudi, Mahieddine
    [J]. ACM PROCEEDINGS OF INTERNATIONAL CONFERENCE OF COMPUTING FOR ENGINEERING AND SCIENCE (ICCES'17), 2017, : 35 - 39
  • [3] Arabic Corpus Linguistics
    Al-Surmi, Mansoor
    [J]. CORPORA, 2021, 16 (02) : 301 - 303
  • [4] A 700M+Arabic corpus: KACST Arabic corpus design and construction
    Al-Thubaity, Abdulmohsen O.
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2015, 49 (03) : 721 - 751
  • [5] A 700M+ Arabic corpus: KACST Arabic corpus design and construction
    Abdulmohsen O. Al-Thubaity
    [J]. Language Resources and Evaluation, 2015, 49 : 721 - 751
  • [6] A Monolingual Parallel Corpus of Arabic
    Al-Raisi, Fatima
    Lin, Weijian
    Bourai, Abdelwahab
    [J]. ARABIC COMPUTATIONAL LINGUISTICS, 2018, 142 : 334 - 338
  • [7] A Multidialectal Parallel Corpus of Arabic
    Bouamor, Houda
    Habash, Nizar
    Oflazer, Kemal
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1240 - 1245
  • [8] The Constitution of an Arabic Touristic Corpus
    Lhioui, Chahira
    Zouaghi, Anis
    Zrigui, Mounir
    [J]. ARABIC COMPUTATIONAL LINGUISTICS, 2018, 142 : 14 - 25
  • [9] Arabic corpus linguistics.
    Holes, Clive
    [J]. LANGUAGE, 2020, 96 (01) : 202 - 206
  • [10] AGENESIS OF THE CORPUS-CALLOSUM ASSOCIATED WITH OCULOCUTANEOUS ALBINISM (OCA)
    PONDER, SW
    GOLD, DM
    LOCKHART, LH
    [J]. CLINICAL RESEARCH, 1988, 36 (01): : A60 - A60