Classification for Authorship of Tweets by Comparing Logistic Regression and Naive Bayes Classifiers

被引:28
|
作者
Aborisade, Opeyemi Mulikat [1 ]
Anwar, Mohd [2 ]
机构
[1] North Carolina A&T State Univ, Dept Appl Math, Greensboro, NC 27411 USA
[2] North Carolina A&T State Univ, Dept Comp Sci, Greensboro, NC USA
基金
美国国家科学基金会;
关键词
authorship attribution; social computing; privacy; security; machine learning; logistic regression; naive Bayes; classification;
D O I
10.1109/IRI.2018.00049
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
At a time when all it takes to open a Twitter account is a mobile phone, the act of authenticating information encountered on social media becomes very complex, especially when we lack measures to verify digital identities in the first place. Because the platform supports anonymity, fake news generated by dubious sources have been observed to travel much faster and farther than real news. Hence, we need valid measures to identify authors of misinformation to avert these consequences. Researchers propose different authorship attribution techniques to approach this kind of problem. However, because tweets are made up of only 280 characters, finding a suitable authorship attribution technique is a challenge. This research aims to classify authors of tweets by comparing machine learning methods like logistic regression and naive Bayes. The processes of this application are fetching of tweets, pre-processing, feature extraction, and developing a machine learning model for classification. This paper illustrates the text classification for authorship process using machine learning techniques. In total, there were 46,895 tweets used as both training and testing data, and unique features specific to Twitter were extracted. Several steps were done in the pre-processing phase, including removal of short texts, removal of stop-words and punctuations, tokenizing and stemming of texts as well. This approach transforms the pre-processed data into a set of feature vector in Python. Logistic regression and naive Bayes algorithms were applied to the set of feature vectors for the training and testing of the classifier. The logistic regression based classifier gave the highest accuracy of 91.1% compared to the naive Bayes classifier with 89.8%.
引用
收藏
页码:269 / 276
页数:8
相关论文
共 50 条
  • [1] On Discriminative vs. Generative classifiers: A comparison of logistic regression and naive Bayes
    Ng, AY
    Jordan, MI
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 841 - 848
  • [2] Naive Bayes classifiers for authorship attribution of Arabic texts
    Altheneyan, Alaa Saleh
    Menai, Mohamed El Bachir
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2014, 26 (04) : 473 - 484
  • [3] Sentiment classification on Big Data using Naive Bayes and Logistic Regression
    Prabhat, Anjuman
    Khullar, Vikas
    [J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2017,
  • [4] Comment on "On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes"
    Xue, Jing-Hao
    Titterington, D. Michael
    [J]. NEURAL PROCESSING LETTERS, 2008, 28 (03) : 169 - 187
  • [5] Comment on “On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes”
    Jing-Hao Xue
    D. Michael Titterington
    [J]. Neural Processing Letters, 2008, 28 : 169 - 187
  • [6] Comparison Of Multinomial Naive Bayes Algorithm And Logistic Regression For Intent Classification In Chatbot
    Setyawan, Muhammad Yusril Helmi
    Awangga, Rolly Maulana
    Efendi, Safif Rafi
    [J]. PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON APPLIED ENGINEERING (ICAE), 2018,
  • [7] Bayesian Naive Bayes classifiers to text classification
    Xu, Shuo
    [J]. JOURNAL OF INFORMATION SCIENCE, 2018, 44 (01) : 48 - 59
  • [8] Comparative analysis of the impact of discretization on the classification with Naive Bayes and semi-Naive Bayes classifiers
    Mizianty, Marcin
    Kurgan, Lukasz
    Ogiela, Marek
    [J]. SEVENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2008, : 823 - +
  • [9] Combined Naive Bayes and logistic regression for quantitative breast sonography
    Sehgal, Chandra M.
    Cary, Theodore W.
    Cwanger, Alyssa
    Levenback, Benjamin J.
    Venkatesh, Santosh S.
    [J]. 2012 IEEE INTERNATIONAL ULTRASONICS SYMPOSIUM (IUS), 2012, : 1686 - 1689
  • [10] Naive Bayes Classifiers for Music Emotion Classification Based on Lyrics
    An, Yunjing
    Sun, Shutao
    Wang, Shujuan
    [J]. 2017 16TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS 2017), 2017, : 635 - 638