Bias analysis in text classification for highly skewed data

被引:0
|
作者
Tang, L [1 ]
Liu, H [1 ]
机构
[1] Arizona State Univ, Dept Comp Sci & Engn, Tempe, AZ 85287 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is often applied to high-dimensional data as a preprocessing step in text classfication. When dealing with highly skewed data, we observe that typical feature selection metrics like information gain or chi-squared are biased toward selecting features for the minor class, and the metric of bi-normal separation can select features for both minor and major classes. In this work, we investigate how these feature selection metrics impact on the performance of frequently used classifiers such as Decision Trees, Naive Bayes, and Support Vector Machines via bias analysis for highly skewed data. Three types of biases are metric bias, class bias, and classifier bias. Extensive experiments are designed to understand how these biases can be employed in concert and efficiently to achieve good classification performance. We report our findings and present recommended approaches to text classification based on bias analysis and the empirical study.
引用
收藏
页码:781 / 784
页数:4
相关论文
共 50 条
  • [21] Bias mitigation in text classification through cGAN and LLMs
    Kumar, Gunjan
    Singh, Jyoti Prakash
    PROCEEDINGS OF THE INDIAN NATIONAL SCIENCE ACADEMY, 2024,
  • [22] Debiasing Embeddings for Reduced Gender Bias in Text Classification
    Prost, Flavien
    Thain, Nithum
    Bolukbasi, Tolga
    GENDER BIAS IN NATURAL LANGUAGE PROCESSING (GEBNLP 2019), 2019, : 69 - 75
  • [23] KATG: Keyword-Bias-Aware Adversarial Text Generation for Text Classification
    Shen, Lingfeng
    Li, Shoushan
    Chen, Ying
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11294 - 11302
  • [24] Text Classification with Transformers and Reformers for Deep Text Data
    Soleymani, Roghayeh
    Farret, Jeremie
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON ADVANCES IN SIGNAL PROCESSING AND ARTIFICIAL INTELLIGENCE, ASPAI' 2020, 2020, : 239 - 243
  • [25] A perspective on gender bias in generated text data
    Hupperich, Thomas
    FRONTIERS IN HUMAN DYNAMICS, 2024, 6
  • [26] Estimating the mean and variance from highly skewed marine data
    Pennington, M
    FISHERY BULLETIN, 1996, 94 (03): : 498 - 505
  • [27] Ensemble Classification for Skewed Data Streams Based on Neural Network
    Zhang, Yong
    Yu, Jiaxin
    Liu, Wenzhe
    Ota, Kaoru
    INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2018, 26 (05) : 839 - 853
  • [28] RUSBoost: Improving Classification Performance when Training Data is Skewed
    Seiffert, Chris
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    Napolitano, Amri
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 3650 - 3653
  • [29] COMPUTATIONAL HYDRODYNAMIC ANALYSIS OF A HIGHLY SKEWED MARINE PROPELLER
    Boumediene, Kadda
    Belhenniche, S. E.
    Imine, Omar
    Bouzit, Mohamed
    JOURNAL OF NAVAL ARCHITECTURE AND MARINE ENGINEERING, 2019, 16 (01): : 21 - 32
  • [30] Load Distribution for a Highly Skewed Bridge: Testing and Analysis
    Huang, Haoxiong
    Shenton, Harry W.
    Chajes, Michael J.
    JOURNAL OF BRIDGE ENGINEERING, 2004, 9 (06) : 558 - 562