Spam detection on social networks using deep contextualized word representation

被引：8

作者：

Ghanem, Razan ^{[1
]}

Erbay, Hasan ^{[2
]}

机构：

[1] Kirikkale Univ, Dept Comp Engn, Kirikkale, Turkey

[2] Univ Turkish Aeronaut Assoc, Dept Comp Engn, Ankara, Turkey

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2023年 / 82卷 / 03期

关键词：

Spam detection; Deep learning; Word embedding; Recurrent neural network; Embedding from language model; ACCOUNTS;

D O I：

10.1007/s11042-022-13397-8

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Spam detection on social networks, considered a short text classification problem, is a challenging task in natural language processing due to the sparsity and ambiguity of the text. One of the key tasks to address this problem is a powerful text representation. Traditional word embedding models solve the data sparsity problem by representing words with dense vectors, but these models have some limitations that prevent them from handling some problems effectively. The most common limitation is the "out of vocabulary" problem, in which the models fail to provide any vector representation for the words that are not present in the model's dictionary. Another problem these models face is the independence from the context, in which the models output just one vector for each word regardless of the position of the word in the sentence. To overcome these problems, we propose to build a new model based on deep contextualized word representation, consequently, in this study, we develop CBLSTM (Contextualized Bi-directional Long Short Term Memory neural network), a novel deep learning architecture based on bidirectional long short term neural network with embedding from language models, to address the spam texts problem on social networks. The experimental results on three benchmark datasets show that our proposed method achieves high accuracy and outperforms the existing state-of-the-art methods to detect spam on social networks.

引用

页码：3697 / 3712

页数：16

共 50 条

[1] Spam detection on social networks using deep contextualized word representation
Razan Ghanem
Hasan Erbay
[J]. Multimedia Tools and Applications, 2023, 82 : 3697 - 3712
[2] Spam detection in online social networks by deep learning
Ameen, Aso Khaleel
Kaya, Buket
[J]. 2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP), 2018,
[3] Beyond Word-Based Model Embeddings: Contextualized Representations for Enhanced Social Media Spam Detection
Alshattnawi, Sawsan
Shatnawi, Amani
AlSobeh, Anas M. R.
Magableh, Aws A.
[J]. APPLIED SCIENCES-BASEL, 2024, 14 (06):
[4] Using Word Embeddings and Deep Learning for Supervised Topic Detection in Social Networks
Gutierrez-Batista, Karel
Campana, Jesus R.
Vila, Maria-Amparo
Martin-Bautista, Maria J.
[J]. FLEXIBLE QUERY ANSWERING SYSTEMS, 2019, 11529 : 155 - 165
[5] Investigating Maps of Science Using Contextual Proximity of Citations Based on Deep Contextualized Word Representation
Roman, Muhammad
Shahid, Abdul
Khan, Shafiullah
Yu, Lisu
Asif, Muhammad
Ghadi, Yazeed Yasin
[J]. IEEE ACCESS, 2022, 10 : 31397 - 31419
[6] Spam Detection In Social Networks: A Review
Eshraqi, Nasim
Jalali, Mehrdad
Moattar, Mohammad Hossein
[J]. SECOND INTERNATIONAL CONGRESS ON TECHNOLOGY, COMMUNICATION AND KNOWLEDGE (ICTCK 2015), 2015, : 148 - 152
[7] Semantic Representation Based on Deep Learning for Spam Detection
Saidani, Nadjate
Adi, Kamel
Allili, Mohand Said
[J]. FOUNDATIONS AND PRACTICE OF SECURITY, FPS 2019, 2020, 12056 : 72 - 81
[8] Deep Learning Empowered Cybersecurity Spam Bot Detection for Online Social Networks
Al Duhayyim, Mesfer
Alshahrani, Haya Mesfer
Al-Wesabi, Fahd N.
Alamgeer, Mohammed
Hilal, Anwer Mustafa
Rizwanullah, Mohammed
[J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (03): : 6257 - 6270
[9] Multistage and Elastic Spam Detection in Mobile Social Networks through Deep Learning
Feng, Bo
Fu, Qiang
Dong, Mianxiong
Guo, Dong
Li, Qiang
[J]. IEEE NETWORK, 2018, 32 (04): : 15 - 21
[10] IMPROVING SPOKEN QUESTION ANSWERING USING CONTEXTUALIZED WORD REPRESENTATION
Su, Dan
Fung, Pascale
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8004 - 8008

← 1 2 3 4 5 →