Spam detection using character N-grams

被引：0

作者：

Kanaris, Ioannis ^{[1
]}

Kanaris, Konstantinos

Stamatatos, Efstathios

机构：

[1] Univ Aegean, Dept Informat & Commun Syst Engn, GR-83200 Karlovassi, Greece

[2] Univ Aegean, Dept Math, GR-83200 Karlovassi, Greece

来源：

ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS | 2006年 / 3955卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a content-based approach to spam detection based on low-level information. Instead of the traditional 'bag of words' representation, we use a 'bag of character n-grams' representation which avoids the sparse data problem that arises in n-grams on the word-level. Moreover, it is language-independent and does not require any lemmatizer or 'deep' text preprocessing. Based on experiments on Ling-Spam corpus we evaluate the proposed representation in combination with support vector machines. Both binary and term-frequency representations achieve high precision rates while maintaining recall on equally high level, which is a crucial factor for anti-spam filters, a cost sensitive application.

引用

页码：95 / 104

页数：10

共 50 条

[1] Detection of Opinion Spam with Character n-grams
Hernandez Fusilier, Donato
Montes-y-Gomez, Manuel
Rosso, Paolo
Guzman Cabrera, Rafael
[J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT II, 2015, 9042 : 285 - 294
[2] Words versus character N-grams for anti-spam filtering
Kanaris, Ioannis
Kanaris, Konstantinos
Houvardas, Ioannis
Stamatatos, Efstathios
[J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2007, 16 (06) : 1047 - 1067
[3] Authorship Attribution in Portuguese Using Character N-grams
Markov, Ilia
Baptista, Jorge
Pichardo-Lagunas, Obdulia
[J]. ACTA POLYTECHNICA HUNGARICA, 2017, 14 (03) : 59 - 78
[4] Plagiarism Detection Using Stopword n-grams
Stamatatos, Efstathios
[J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2011, 62 (12): : 2512 - 2527
[5] A first approach to CLIR using character n-grams alignment
Vilares, Jesus
Oakes, Michael P.
Tait, John I.
[J]. EVALUATION OF MULTILINGUAL AND MULTI-MODAL INFORMATION RETRIEVAL, 2007, 4730 : 111 - +
[6] Which Granularity to Bootstrap a Multilingual Method of Document Alignment: Character N-grams or Word N-grams?
Lecluze, Charlotte
Rigouste, Lois
Giguet, Emmanuel
Lucas, Nadine
[J]. CORPUS RESOURCES FOR DESCRIPTIVE AND APPLIED STUDIES. CURRENT CHALLENGES AND FUTURE DIRECTIONS: SELECTED PAPERS FROM THE 5TH INTERNATIONAL CONFERENCE ON CORPUS LINGUISTICS (CILC2013), 2013, 95 : 473 - 481
[7] Clone Detection for Ecore Metamodels using N-grams
Babur, Onder
[J]. PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON MODEL-DRIVEN ENGINEERING AND SOFTWARE DEVELOPMENT, 2018, : 411 - 419
[8] Predicting Political Donations Using Twitter Hashtags and Character N-Grams
Conrad, Colin
Keselj, Vlado
[J]. 2016 IEEE 18TH CONFERENCE ON BUSINESS INFORMATICS (CBI), VOL. 2, 2016, : 1 - 7
[9] Embedded malware detection using Markov n-grams
Shafiq, M. Zubair
Khayam, Syed Ali
Farooq, Muddassar
[J]. DETECTION OF INTRUSIONS AND MALWARE, AND VULNERABILITY ASSESSMENT, 2008, 5137 : 88 - +
[10] Author Assertion of Furtive Write Print Using Character N-Grams
Hassan, Feryal H.
Chaurasia, Mousmi A.
[J]. FUTURE INFORMATION TECHNOLOGY, 2011, 13 : 274 - 278

← 1 2 3 4 5 →