The Optimization of n-Gram Feature Extraction Based on Term Occurrence for Cyberbullying Classification

被引：0

作者：

Setiawan, Yudi ^{[1
]}

Maulidevi, Nur Ulfa ^{[1
]}

Surendro, Kridanto ^{[1
]}

机构：

[1] School of Electrical Engineering and Informatics, Institute of Technology Bandung, Bandung, Indonesia

来源：

Data Science Journal | 2024年 / 23卷 / 01期

关键词：

Cyberbullied communications should be bundled since online harassment is growing and has serious implications. High cyberbullying requires strong text classification algorithms to safeguard persons and communities. The n-Gram models language by collecting ‘n’ components; usually words or characters; from a text and detecting how words relate and if major items or sentences are cyberbullying document types. The research improves term value generation and text classification accuracy by extracting features using TF-IDF and n-Gram. The optimum TF-IDF feature extraction pattern demonstrated the usefulness of n-Gram in cyberbullying document classification. This field demands good categorization and feature extraction. Because cyberbullying takes numerous forms and venues; broad classification is essential. To test unigram; bigram; and trigram approaches across text lengths and frequencies; this study uses several parameter values. The research also shows the limitations and gaps in earlier methods and underscores the necessity for various n-Gram parameter values to overcome cyberbullying text complexity. Short-sentence articles; fluctuating data frequencies; and dynamic online interactions necessitate complex solutions. Ideal n-Gram patterns increase cyberbullying text categorization and give context to the field. This research acknowledges cyberbullying’s prevalence and effects; the necessity for effective categorization methods; and current techniques’ limitations; opening the way for more comprehensive and adaptive online harassment combating strategies. © 2024 The Author(s);

D O I：

10.5334/dsj-2024-031

中图分类号：

学科分类号：

摘要：

引用

共 50 条

[1] Partitioning Based N-Gram Feature Selection for Malware Classification
Hu, Weiwei
Tan, Ying
[J]. DATA MINING AND BIG DATA, DMBD 2016, 2016, 9714 : 187 - 195
[2] Sparse Coding for N-Gram Feature Extraction and Training for File Fragment Classification
Wang, Felix
Quach, Tu-Thach
Wheeler, Jason
Aimone, James B.
James, Conrad D.
[J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2018, 13 (10) : 2553 - 2562
[3] Apriori and N-gram Based Chinese Text Feature Extraction Method
王晔
黄上腾
[J]. Journal of Shanghai Jiaotong University(Science), 2004, (04) : 11 - 14
[4] Chinese keyword extraction based on N-gram and word co-occurrence
Jiao, Hui
Liu, Qian
Jia, Hui-bo
[J]. CIS WORKSHOPS 2007: INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY WORKSHOPS, 2007, : 152 - +
[5] LANGUAGE IDENTIFICATION BASED ON N-GRAM FEATURE EXTRACTION METHOD BY USING CLASSIFIERS
Bayrak Hayta, Sengul
Takci, Hidayet
Eminli, Mubariz
[J]. ISTANBUL UNIVERSITY-JOURNAL OF ELECTRICAL AND ELECTRONICS ENGINEERING, 2013, 13 (02): : 1629 - 1638
[6] Short Text Classification Based on Feature Extension Using The N-Gram Model
Zhang, Xinwei
Wu, Bin
[J]. 2015 12TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2015, : 710 - 716
[7] Alphabet Flatting as a variant of n-gram feature extraction method in ensemble classification of fake news
Ksieniewicz, Pawel
Zyblewski, Pawel
Borek-Marciniec, Weronika
Kozik, Rafal
Choras, Michal
Wozniak, Michal
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 120
[8] A variant of n-gram based language classification
Tomovic, Andrija
Janicic, Predrag
[J]. AI(ASTERISK)IA 2007: ARTIFICIAL INTELLIGENCE AND HUMAN-ORIENTED COMPUTING, 2007, 4733 : 410 - +
[9] An N-Gram Based Method for Bengali Keyphrase Extraction
Sarkar, Kamal
[J]. INFORMATION SYSTEMS FOR INDIAN LANGUAGES, 2011, 139 : 36 - 41
[10] Advanced Information Extraction with n-gram based LSI
Guven, Ahmet
Bozkurt, O. Ozgur
Kalipsiz, Oya
[J]. PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 17, 2006, 17 : 13 - 18

← 1 2 3 4 5 →