The Optimization of n-Gram Feature Extraction Based on Term Occurrence for Cyberbullying Classification

被引:0
|
作者
Setiawan, Yudi [1 ]
Maulidevi, Nur Ulfa [1 ]
Surendro, Kridanto [1 ]
机构
[1] School of Electrical Engineering and Informatics, Institute of Technology Bandung, Bandung, Indonesia
关键词
Cyberbullied communications should be bundled since online harassment is growing and has serious implications. High cyberbullying requires strong text classification algorithms to safeguard persons and communities. The n-Gram models language by collecting ‘n’ components; usually words or characters; from a text and detecting how words relate and if major items or sentences are cyberbullying document types. The research improves term value generation and text classification accuracy by extracting features using TF-IDF and n-Gram. The optimum TF-IDF feature extraction pattern demonstrated the usefulness of n-Gram in cyberbullying document classification. This field demands good categorization and feature extraction. Because cyberbullying takes numerous forms and venues; broad classification is essential. To test unigram; bigram; and trigram approaches across text lengths and frequencies; this study uses several parameter values. The research also shows the limitations and gaps in earlier methods and underscores the necessity for various n-Gram parameter values to overcome cyberbullying text complexity. Short-sentence articles; fluctuating data frequencies; and dynamic online interactions necessitate complex solutions. Ideal n-Gram patterns increase cyberbullying text categorization and give context to the field. This research acknowledges cyberbullying’s prevalence and effects; the necessity for effective categorization methods; and current techniques’ limitations; opening the way for more comprehensive and adaptive online harassment combating strategies. © 2024 The Author(s);
D O I
10.5334/dsj-2024-031
中图分类号
学科分类号
摘要
引用
收藏
相关论文
共 50 条
  • [41] Classification of ransomware families with machine learning based on N-gram of opcodes
    Zhang, Hanqi
    Xiao, Xi
    Mercaldo, Francesco
    Ni, Shiguang
    Martinelli, Fabio
    Sangaiah, Arun Kumar
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 90 : 211 - 221
  • [42] Web Page Classification using n-gram based URL Features
    Rajalakshmi, R.
    Aravindan, Chandrabose
    [J]. 2013 FIFTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2013, : 15 - 21
  • [43] An n-gram based approach to the automatic classification of schoolchildren's writing
    Cicres, Jordi
    Queralt, Sheila
    [J]. VIAL-VIGO INTERNATIONAL JOURNAL OF APPLIED LINGUISTICS, 2019, 16 : 53 - 80
  • [44] A new type of feature - Loose N-gram feature in text categorization
    Zhang, Xian
    Zhu, Xiaoyan
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, PT 1, PROCEEDINGS, 2007, 4477 : 378 - +
  • [45] N-Gram and TF-IDF for Feature Extraction on Opinion Mining of Tweets with SVM Classifier
    Brandao, Jhonathan de Godoi
    Calixto, Wesley Pacheco
    [J]. 2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,
  • [46] Approach to Predict Software Vulnerability Based on Multiple-Level N-gram Feature Extraction and Heterogeneous Ensemble Learning
    Zhang, Bing
    Gao, Yuan
    Wu, Jingyi
    Wang, Ning
    Wang, Qian
    Ren, Jiadong
    [J]. INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2022, 32 (10) : 1559 - 1582
  • [47] An investigation of byte n-gram features for malware classification
    Raff, Edward
    Zak, Richard
    Cox, Russell
    Sylvester, Jared
    Yacci, Paul
    Ward, Rebecca
    Tracy, Anna
    McLean, Mark
    Nicholas, Charles
    [J]. JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2018, 14 (01): : 1 - 20
  • [48] Analysis of N-gram model on Telugu Document Classification
    Rani, B. Padmaja
    Vardhan, B. Vishnu
    Durga, A. Kanaka
    Reddy, L. Pratap
    Babu, A. Vinaya
    [J]. 2008 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-8, 2008, : 3199 - +
  • [49] News Thread Extraction Based on Topical N-Gram Model with a Background Distribution
    Yan, Zehua
    Li, Fang
    [J]. NEURAL INFORMATION PROCESSING, PT II, 2011, 7063 : 416 - 424
  • [50] N-gram Based Image Representation And Classification Using Perceptual Shape Features
    Mukanova, Albina
    Hu, Gang
    Gao, Qigang
    [J]. 2014 CANADIAN CONFERENCE ON COMPUTER AND ROBOT VISION (CRV), 2014, : 349 - 356