The Optimization of n-Gram Feature Extraction Based on Term Occurrence for Cyberbullying Classification

被引:0
|
作者
Setiawan, Yudi [1 ]
Maulidevi, Nur Ulfa [1 ]
Surendro, Kridanto [1 ]
机构
[1] School of Electrical Engineering and Informatics, Institute of Technology Bandung, Bandung, Indonesia
关键词
Cyberbullied communications should be bundled since online harassment is growing and has serious implications. High cyberbullying requires strong text classification algorithms to safeguard persons and communities. The n-Gram models language by collecting ‘n’ components; usually words or characters; from a text and detecting how words relate and if major items or sentences are cyberbullying document types. The research improves term value generation and text classification accuracy by extracting features using TF-IDF and n-Gram. The optimum TF-IDF feature extraction pattern demonstrated the usefulness of n-Gram in cyberbullying document classification. This field demands good categorization and feature extraction. Because cyberbullying takes numerous forms and venues; broad classification is essential. To test unigram; bigram; and trigram approaches across text lengths and frequencies; this study uses several parameter values. The research also shows the limitations and gaps in earlier methods and underscores the necessity for various n-Gram parameter values to overcome cyberbullying text complexity. Short-sentence articles; fluctuating data frequencies; and dynamic online interactions necessitate complex solutions. Ideal n-Gram patterns increase cyberbullying text categorization and give context to the field. This research acknowledges cyberbullying’s prevalence and effects; the necessity for effective categorization methods; and current techniques’ limitations; opening the way for more comprehensive and adaptive online harassment combating strategies. © 2024 The Author(s);
D O I
10.5334/dsj-2024-031
中图分类号
学科分类号
摘要
引用
收藏
相关论文
共 50 条
  • [1] Partitioning Based N-Gram Feature Selection for Malware Classification
    Hu, Weiwei
    Tan, Ying
    [J]. DATA MINING AND BIG DATA, DMBD 2016, 2016, 9714 : 187 - 195
  • [2] Sparse Coding for N-Gram Feature Extraction and Training for File Fragment Classification
    Wang, Felix
    Quach, Tu-Thach
    Wheeler, Jason
    Aimone, James B.
    James, Conrad D.
    [J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2018, 13 (10) : 2553 - 2562
  • [3] Apriori and N-gram Based Chinese Text Feature Extraction Method
    王晔
    黄上腾
    [J]. Journal of Shanghai Jiaotong University(Science), 2004, (04) : 11 - 14
  • [4] Chinese keyword extraction based on N-gram and word co-occurrence
    Jiao, Hui
    Liu, Qian
    Jia, Hui-bo
    [J]. CIS WORKSHOPS 2007: INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY WORKSHOPS, 2007, : 152 - +
  • [5] LANGUAGE IDENTIFICATION BASED ON N-GRAM FEATURE EXTRACTION METHOD BY USING CLASSIFIERS
    Bayrak Hayta, Sengul
    Takci, Hidayet
    Eminli, Mubariz
    [J]. ISTANBUL UNIVERSITY-JOURNAL OF ELECTRICAL AND ELECTRONICS ENGINEERING, 2013, 13 (02): : 1629 - 1638
  • [6] Short Text Classification Based on Feature Extension Using The N-Gram Model
    Zhang, Xinwei
    Wu, Bin
    [J]. 2015 12TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2015, : 710 - 716
  • [7] Alphabet Flatting as a variant of n-gram feature extraction method in ensemble classification of fake news
    Ksieniewicz, Pawel
    Zyblewski, Pawel
    Borek-Marciniec, Weronika
    Kozik, Rafal
    Choras, Michal
    Wozniak, Michal
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 120
  • [8] A variant of n-gram based language classification
    Tomovic, Andrija
    Janicic, Predrag
    [J]. AI(ASTERISK)IA 2007: ARTIFICIAL INTELLIGENCE AND HUMAN-ORIENTED COMPUTING, 2007, 4733 : 410 - +
  • [9] An N-Gram Based Method for Bengali Keyphrase Extraction
    Sarkar, Kamal
    [J]. INFORMATION SYSTEMS FOR INDIAN LANGUAGES, 2011, 139 : 36 - 41
  • [10] Advanced Information Extraction with n-gram based LSI
    Guven, Ahmet
    Bozkurt, O. Ozgur
    Kalipsiz, Oya
    [J]. PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 17, 2006, 17 : 13 - 18