Authorship Attribution of Arabic Articles

被引:3
|
作者
Hajja, Maha [1 ]
Yahya, Ahmad [1 ]
Yahya, Adnan [1 ]
机构
[1] Birzeit Univ, Dept Elect & Comp Engn, Birzeit, Palestine
关键词
Arabic authorship attribution; Arabic plagiarism detection; Writing style recognition; Arabic special features; Arabic text author identification;
D O I
10.1007/978-3-030-32959-4_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the huge size and large diversity of web content and the appearance of more social media platforms and blog websites, more people are contributing content of varying quality. Many users prefer to keep themselves anonymous when posting material to the web, which resulted in more pieces of text: articles, blogs, essays and emails being published under assumed identities or have no known author. This may result in copyright and other legal issues and thus the need for good authorship attribution systems. The problem may be more acute for Arabic texts due to restrictions, actual and perceived, on electronic content publication and the prevailing social norms. In this paper we study the issue of Arabic author attribution (AAA) concerned with designating a particular author of an Arabic (MSA) article from among a given set of potential authors. Many features were taken into consideration for training and testing our models for AAA. We studied the effects of features like part of speech (PoS) tags, stylistic issues like punctuation marks usage and sentence characteristics, word types and word diversity. In general, PoS features, word n-grams features and rare words proved to be the most informative for our task. We also investigated the effect of factors like number of potential authors, number of articles per author, and the size of text chunks used and we report on the results.
引用
收藏
页码:194 / 208
页数:15
相关论文
共 50 条
  • [1] Authorship Attribution of Arabic Tweets
    Rabab'ah, Abdullateef
    Al-Ayyoub, Mahmoud
    Jararweh, Yaser
    Aldwairi, Monther
    [J]. 2016 IEEE/ACS 13TH INTERNATIONAL CONFERENCE OF COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2016,
  • [2] Authorship Attribution in Arabic Poetry
    Ahmed, Alfalahi
    Mohamed, Ramdani
    Mostafa, Bellafkih
    [J]. 2016 11TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS: THEORIES AND APPLICATIONS (SITA), 2016,
  • [3] On Authorship Authentication of Arabic Articles
    Alwajeeh, Ahmed
    Al-Ayyoub, Mahmoud
    Hmeidi, Ismail
    [J]. 2014 5TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2014,
  • [4] Authorship Attribution of Polish Newspaper Articles
    Kuta, Marcin
    Puto, Bartlomiej
    Kitowski, Jacek
    [J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, (ICAISC 2016), PT II, 2016, 9693 : 474 - 483
  • [5] The Effectiveness of Stemming in the Stylometric Authorship Attribution in Arabic
    Omar, Abdulfattah
    Hamouda, Wafya Ibrahim
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (01) : 116 - 121
  • [6] The effectiveness of stemming in the stylometric authorship attribution in Arabic
    Omar, Abdulfattah
    Hamouda, Wafya Ibrahim
    [J]. International Journal of Advanced Computer Science and Applications, 2020, 11 (01): : 116 - 121
  • [7] Arabic Authorship Attribution: An Extensive Study on Twitter Posts
    Altakrori, Malik H.
    Iqbal, Farkhund
    Fung, Benjamin C. M.
    Ding, Steven H. H.
    Tubaishat, Abdallah
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2019, 18 (01)
  • [8] Naive Bayes classifiers for authorship attribution of Arabic texts
    Altheneyan, Alaa Saleh
    Menai, Mohamed El Bachir
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2014, 26 (04) : 473 - 484
  • [9] A Comparative Survey of Authorship Attribution on Short Arabic Texts
    Ouamour, Siham
    Sayoud, Halim
    [J]. SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 479 - 489
  • [10] Towards Authorship Attribution in Arabic Short-Microblog Text
    Jambi, Kamal Mansour
    Khan, Imtiaz Hussain
    Siddiqui, Muazzam Ahmed
    Alhaj, Salma Omar
    [J]. IEEE ACCESS, 2021, 9 : 128506 - 128520