Feature extraction and selection for Arabic tweets authorship authentication

被引:22
|
作者
Al-Ayyoub, Mahmoud [1 ]
Jararweh, Yaser [1 ]
Rabab'ah, Abdullateef [1 ]
Aldwairi, Monther [2 ]
机构
[1] Jordan Univ Sci & Technol, Irbid, Jordan
[2] Zayed Univ, Dubai, U Arab Emirates
关键词
Online social networks; Authorship authentication; Computational intelligence; Stylometric features; BOW features; SVM; NB; Decision tree; Correlation-based feature selection; Relief; PCA; Information gain; E-MAIL; IDENTIFICATION; ATTRIBUTION;
D O I
10.1007/s12652-017-0452-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In tweet authentication, we are concerned with correctly attributing a tweet to its true author based on its textual content. The more general problem of authenticating long documents has been studied before and the most common approach relies on the intuitive idea that each author has a unique style that can be captured using stylometric features (SF). Inspired by the success of modern automatic document classification problem, some researchers followed the Bag-Of-Words (BOW) approach for authenticating long documents. In this work, we consider both approaches and their application on authenticating tweets, which represent additional challenges due to the limitation in their sizes. We focus on the Arabic language due to its importance and the scarcity of works related on it. We create different sets of features from both approaches and compare the performance of different classifiers using them. We experiment with various feature selection techniques in order to extract the most discriminating features. To the best of our knowledge, this is the first study of its kind to combine these different sets of features for authorship analysis of Arabic tweets. The results show that combining all the feature sets we compute yields the best results.
引用
收藏
页码:383 / 393
页数:11
相关论文
共 50 条
  • [1] Feature extraction and selection for Arabic tweets authorship authentication
    Mahmoud Al-Ayyoub
    Yaser Jararweh
    Abdullateef Rabab’ah
    Monther Aldwairi
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2017, 8 : 383 - 393
  • [2] Using Big Data Analytics For Authorship Authentication of Arabic Tweets
    Albadarneh, Jafar
    Talafha, Bashar
    Al-Ayyoub, Mahmoud
    Zaqaibeh, Belal
    Al-Smadi, Mohammad
    Jararweh, Yaser
    Benkhelifa, Elhadj
    [J]. 2015 IEEE/ACM 8TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2015, : 448 - 452
  • [3] Authorship Attribution of Arabic Tweets
    Rabab'ah, Abdullateef
    Al-Ayyoub, Mahmoud
    Jararweh, Yaser
    Aldwairi, Monther
    [J]. 2016 IEEE/ACS 13TH INTERNATIONAL CONFERENCE OF COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2016,
  • [4] On Authorship Authentication of Arabic Articles
    Alwajeeh, Ahmed
    Al-Ayyoub, Mahmoud
    Hmeidi, Ismail
    [J]. 2014 5TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2014,
  • [5] Investigating Predictive Features for Authorship Verification of Arabic Tweets
    Alqahtani, Fatimah
    Dohler, Mischa
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2022, 22 (06): : 115 - 126
  • [6] Improving arabic signature authentication with quantum inspired evolutionary feature selection
    Abdulhussien, Ansam A.
    Nasrudin, Mohammad F.
    Darwish, Saad M.
    Alyasseri, Zaid A.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (28) : 71495 - 71524
  • [7] Analysis and Evaluation of Two Feature Selection Algorithms in Improving the Performance of the Sentiment Analysis Model of Arabic Tweets
    Yousef, Maria
    ALali, Abdulla
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (06) : 705 - 711
  • [8] Sentiment Analysis of Arabic Tweets: Opinion Target Extraction
    Salima, Behdenna
    Fatiha, Barigou
    Ghalem, Belalem
    [J]. MODELLING AND IMPLEMENTATION OF COMPLEX SYSTEMS, 2019, 64 : 158 - 167
  • [9] Finger Vein Extraction and Authentication Based on Gradient Feature Selection Algorithm
    Parthiban, K.
    Wahi, Amitabh
    Sundaramurthy, S.
    Palanisamy, C.
    [J]. 2014 FIFTH INTERNATIONAL CONFERENCE ON THE APPLICATIONS OF DIGITAL INFORMATION AND WEB TECHNOLOGIES (ICADIWT), 2014, : 143 - 147
  • [10] Topic Extraction from Millions of Tweets using Singular Value Decomposition and Feature Selection
    Hashimoto, Takako
    Kuboyama, Tetsuji
    Chakraborty, Basabi
    [J]. 2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 1145 - 1150