Imbalanced Data Handling in Multi-label Aspect Categorization using Oversampling and Ensemble Learning

被引:0
|
作者
Alnatara, Wildan Dicky [1 ]
Khodra, Masayu Leylia [1 ]
机构
[1] Inst Teknol Bandung, Sch Elect Engn & Informat, Bandung, Indonesia
关键词
aspect categorization; imbalanced multilabel data; Cross-Coupling Aggregation; Multilabel Synthetic Minority Over-sampling Technique; Multilabel Synthetic Oversampling approach based on the Local distribution of labels;
D O I
10.1109/icacsis51025.2020.9263087
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In sentiment analysis, aspect based sentiment analysis (ABSA) provides detailed information of user sentiment for a product rather than document level and sentence level. Aspect categorization is one of ABSA tasks, which focuses on categorizing which aspects are related to a review text. This task working on multilabel data that usually have uneven distribution of aspect occurrences or imbalanced data condition. This paper uses 9284 data from user review text in the hotel domain. We employ 3 techniques to address imbalanced multilabel data, namely cross-coupling aggregation (COCOA), multilabel synthetic minority oversampling technique (MLSMOTE), and multilabel synthetic oversampling approach based on the local distribution of labels (MLSOL). Convolutional Neural Network (CNN)-Classifier Chain (CC)-Extreme Gradient Boosting (XGBoost) is employed as a baseline and base architecture to be applied into those 3 techniques of handling imbalanced multilabel dataset. COCOA and MLSMOTE are the best performers. COCOA achieved F1-Macro of 0.9272, F1 macro MLSMOTE is 0.9276 and F1-Macro baseline is 0.9261. The best performer of COCOA is configured using 4 parameters: binary relevance mode is smote-oversampling, multiclass mode is smote-oversampling, random state =10, and binary relevance ratio =0.5. The best performer of MLSMOTE is configured using 2 parameters: number of neighbors =5, and random state =42.
引用
收藏
页码:165 / 170
页数:6
相关论文
共 50 条
  • [21] Multi-label borderline oversampling technique
    Teng, Zeyu
    Cao, Peng
    Huang, Min
    Gao, Zheming
    Wang, Xingwei
    PATTERN RECOGNITION, 2024, 145
  • [22] Irrelevant attributes and imbalanced classes in multi-label text-categorization domains
    Dendamrongvit, Sareewan
    Vateekul, Peerapon
    Kubat, Miroslav
    INTELLIGENT DATA ANALYSIS, 2011, 15 (06) : 843 - 859
  • [23] Hierarchical Multi-label Classification using Fully Associative Ensemble Learning
    Zhang, L.
    Shah, S. K.
    Kakadiaris, I. A.
    PATTERN RECOGNITION, 2017, 70 : 89 - 103
  • [24] Multi-Label Learning with Distribution Matching Ensemble: An Adaptive and Just-In-Time Weighted Ensemble Learning Algorithm for Classifying a Nonstationary Online Multi-Label Data Stream
    Shen, Chao
    Liu, Bingyu
    Shao, Changbin
    Yang, Xibei
    Xu, Sen
    Zhu, Changming
    Yu, Hualong
    SYMMETRY-BASEL, 2025, 17 (02):
  • [25] Extending version-space theory to multi-label active learning with imbalanced data
    Wang, Ran
    Chen, Shuyue
    Yu, Yu
    PATTERN RECOGNITION, 2023, 142
  • [26] A novel ensemble over-sampling approach based Chebyshev inequality for imbalanced multi-label data
    Ren, Weishuo
    Zheng, Yifeng
    Zhang, Wenjie
    Qing, Depeng
    Zeng, Xianlong
    Li, Guohe
    NEUROCOMPUTING, 2025, 612
  • [27] Fast Induction of Multiple Decision Trees in Text Categorization from Large Scale, Imbalanced, and Multi-label Data
    Vateekul, Peerapon
    Kubat, Miroslav
    2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 320 - 325
  • [28] Multi-label Aspect Categorization with Convolutional Neural Networks and Extreme Gradient Boosting
    Azhar, Annisa Nurul
    Khodra, Masayu Leylia
    Sutiono, Arie Pratama
    PROCEEDING OF 2019 INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS (ICEEI), 2019, : 35 - 40
  • [29] Online Multi-label Feature Selection on Imbalanced Data Sets
    Liu, Jing
    Guo, Zhongwen
    Sun, Zhongwei
    Liu, Shiyong
    Wang, Xupeng
    WIRELESS SENSOR NETWORKS (CWSN 2017), 2018, 812 : 165 - 174
  • [30] An Improved Multi-label Classification Ensemble Learning Algorithm
    Fu, Zhongliang
    Wang, Lili
    Zhang, Danpu
    PATTERN RECOGNITION (CCPR 2014), PT I, 2014, 483 : 243 - 252