Detecting Spam Tweets using Character N-gram Features

被引:0
|
作者
Ashour, Mokhtar [1 ]
Salama, Cherif [1 ]
El-Kharashi, M. Watheq [1 ]
机构
[1] Ain Shams Univ, Comp & Syst Engn Dept, Cairo, Egypt
关键词
machine learning; n-grains; spain Detection; twitter;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Twitter popularity made it an important and instantaneous source of news and trending events around the world. It has attracted the attention of spammers who post malicious content embedded in tweets and in their profile pages. Spammers use different and evolving techniques to evade traditional security mechanisms, and that creates the need to develop robust solutions that adapt with these techniques. In this paper, we propose using a low-level character n-grams feature that avoids the use of tokenizers or any language dependent tools. Using a publicly available dataset, we evaluate the performance of multiple machine learning classifiers with different representations of the proposed feature. Our experiments show that our approach is an enhancement over the approaches that use word n-grams from tweet tokens. We also show that our technique can detect spam tweets with low latency which is crucial in a real-time environment like twitter.
引用
收藏
页码:190 / 195
页数:6
相关论文
共 50 条
  • [1] Behavior Extraction from Tweets using Character N-gram Models
    Yano, Yuji
    Hashiyama, Tomonori
    Ichino, Junko
    Tano, Shun'ichi
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2014, : 1273 - 1280
  • [2] Using Character N-gram Features and Multinomial Naive Bayes for Sentiment Polarity Detection in Bengali Tweets
    Sarkar, Kamal
    [J]. PROCEEDINGS OF 2018 FIFTH INTERNATIONAL CONFERENCE ON EMERGING APPLICATIONS OF INFORMATION TECHNOLOGY (EAIT), 2018,
  • [3] Multilingual Text Categorization Using Character N-gram
    Suzuki, Makoto
    Yamagishi, Naohide
    Tsai, Yi-Ching
    Hirasawa, Shigeichi
    [J]. 2008 IEEE CONFERENCE ON SOFT COMPUTING IN INDUSTRIAL APPLICATIONS SMCIA/08, 2009, : 49 - +
  • [4] Chinese Text Categorization Using the Character N-gram
    Suzuki, Makoto
    Yamagishi, Naohide
    Tsai, Yi-Ching
    [J]. 2012 INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY AND ITS APPLICATIONS (ISITA 2012), 2012, : 722 - 726
  • [5] EFFICIENT DEEP FEATURES LEARNING FOR VULNERABILITY DETECTION USING CHARACTER N-GRAM EMBEDDING
    Alenezi, Mamdouh
    Zagane, Mohammed
    Javed, Yasir
    [J]. JORDANIAN JOURNAL OF COMPUTERS AND INFORMATION TECHNOLOGY, 2021, 7 (01): : 25 - 38
  • [6] Automatic Detection of Toxic South African Tweets Using Support Vector Machines with N-Gram Features
    Oriola, Oluwafemi
    Kotze, Eduan
    [J]. 2019 6TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI 2019), 2019, : 126 - 130
  • [7] Robustness of Word and Character N-gram Combinations in Detecting Deceptive and Truthful Opinions
    Siagian, Al Hafiz Akbar Maulana
    Aritsugi, Masayoshi
    [J]. ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2020, 12 (01):
  • [8] Spam E-Mail Classification by Utilizing N-Gram Features of Hyperlink Texts
    Bozkir, A. Selman
    Sahin, Esra
    Aydos, Murat
    Sezer, Ebru Akcapinar
    Orhan, Fatih
    [J]. 2017 11TH IEEE INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT 2017), 2017, : 308 - 312
  • [9] Character n-Gram Spotting in Document Images
    Praveen, Sudha M.
    Sankar, Pramod K.
    Jawahar, C. V.
    [J]. 11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 941 - 945
  • [10] STATIC INTERPOLATION OF EXPONENTIAL N-GRAM MODELS USING FEATURES OF FEATURES
    Sethy, Abhinav
    Chen, Stanley
    Ramabhadran, Bhuvana
    Vozila, Paul
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,