Textual Feature Extraction Using Ant Colony Optimization for Hate Speech Classification

被引:7
|
作者
Gite, Shilpa [1 ]
Patil, Shruti [1 ]
Dharrao, Deepak [2 ]
Yadav, Madhuri [1 ]
Basak, Sneha [1 ]
Rajendran, Arundarasi [1 ]
Kotecha, Ketan [3 ]
机构
[1] Symbiosis Int Deemed Univ, Symbiosis Inst Technol, Symbiosis Ctr Appl Artificial Intelligence, Dept Artificial Intelligence & Machine Learning, Pune 412115, India
[2] Symbiosis Int Deemed Univ, Symbiosis Inst Technol, Dept Comp Sci & Engn, Pune 412115, India
[3] Symbiosis Int Deemed Univ, Symbiosis Inst Technol, Symbiosis Ctr Appl Artificial Intelligence, Pune 412115, India
关键词
feature engineering; Term Frequency-Inverse Document Frequency (TF-IDF); Bag of Words (BoW); Chi-square test; Ant Colony Optimization (ACO); machine learning; FEATURE-SELECTION METHOD; LOGISTIC-REGRESSION;
D O I
10.3390/bdcc7010045
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection and feature extraction have always been of utmost importance owing to their capability to remove redundant and irrelevant features, reduce the vector space size, control the computational time, and improve performance for more accurate classification tasks, especially in text categorization. These feature engineering techniques can further be optimized using optimization algorithms. This paper proposes a similar framework by implementing one such optimization algorithm, Ant Colony Optimization (ACO), incorporating different feature selection and feature extraction techniques on textual and numerical datasets using four machine learning (ML) models: Logistic Regression (LR), K-Nearest Neighbor (KNN), Stochastic Gradient Descent (SGD), and Random Forest (RF). The aim is to show the difference in the results achieved on both datasets with the help of comparative analysis. The proposed feature selection and feature extraction techniques assist in enhancing the performance of the machine learning model. This research article considers numerical and text-based datasets for stroke prediction and detecting hate speech, respectively. The text dataset is prepared by extracting tweets consisting of positive, negative, and neutral sentiments from Twitter API. A maximum improvement in accuracy of 10.07% is observed for Random Forest with the TF-IDF feature extraction technique on the application of ACO. Besides, this study also highlights the limitations of text data that inhibit the performance of machine learning models, justifying the difference of almost 18.43% in accuracy compared to that of numerical data.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Feature Selection using Ant Colony Optimization
    Deriche, Mohamed
    [J]. 2009 6TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS AND DEVICES, VOLS 1 AND 2, 2009, : 619 - 622
  • [2] Pattern Matching based Classification using Ant Colony Optimization based Feature Selection
    Sreeja, N. K.
    Sankar, A.
    [J]. APPLIED SOFT COMPUTING, 2015, 31 : 91 - 102
  • [3] Text feature selection using ant colony optimization
    Aghdam, Mehdi Hosseinzadeh
    Ghasem-Aghaee, Nasser
    Basiri, Mohammad Ehsan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 6843 - 6853
  • [4] Classification with ant colony optimization
    Martens, David
    De Backer, Manu
    Haesen, Raf
    Vanthienen, Jan
    Snoeck, Monique
    Baesens, Bart
    [J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2007, 11 (05) : 651 - 665
  • [5] An Ant Colony Optimization Based Feature Selection for Web Page Classification
    Sarac, Esra
    Ozel, Selma Ayse
    [J]. SCIENTIFIC WORLD JOURNAL, 2014,
  • [6] Ant colony optimization for feature selection and classification of microcalcifications in digital mammograms
    Karnan, M.
    Thangavel, K.
    Sivakuar, R.
    Geetha, K.
    [J]. 2006 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATIONS, VOLS 1 AND 2, 2007, : 290 - +
  • [7] Ant Colony Optimization Based Feature Selection for Opinion Mining Classification
    Saraswathi, K.
    Tamilarasi, A.
    [J]. JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2016, 6 (07) : 1594 - 1599
  • [8] Classification using unstructured rules and Ant Colony Optimization
    Nejad, Negar Zakeri
    Bakhtiary, Amir H.
    Analoui, Morteza
    [J]. IMECS 2008: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2008, : 506 - +
  • [9] Bug severity classification in software using ant colony optimization based feature weighting technique
    Kukkar, Ashima
    Kumar, Yugal
    Sharma, Ashutosh
    Sandhu, Jasminder Kaur
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 230
  • [10] Feature Subset Selection Using Ant Colony Optimization for a Decision Trees Classification of Medical Data
    Alaoui, Abdiya
    Elberrichi, Zakaria
    [J]. INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2018, 8 (04) : 39 - 50