Imbalanced Data Handling in Multi-label Aspect Categorization using Oversampling and Ensemble Learning

被引:0
|
作者
Alnatara, Wildan Dicky [1 ]
Khodra, Masayu Leylia [1 ]
机构
[1] Inst Teknol Bandung, Sch Elect Engn & Informat, Bandung, Indonesia
关键词
aspect categorization; imbalanced multilabel data; Cross-Coupling Aggregation; Multilabel Synthetic Minority Over-sampling Technique; Multilabel Synthetic Oversampling approach based on the Local distribution of labels;
D O I
10.1109/icacsis51025.2020.9263087
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In sentiment analysis, aspect based sentiment analysis (ABSA) provides detailed information of user sentiment for a product rather than document level and sentence level. Aspect categorization is one of ABSA tasks, which focuses on categorizing which aspects are related to a review text. This task working on multilabel data that usually have uneven distribution of aspect occurrences or imbalanced data condition. This paper uses 9284 data from user review text in the hotel domain. We employ 3 techniques to address imbalanced multilabel data, namely cross-coupling aggregation (COCOA), multilabel synthetic minority oversampling technique (MLSMOTE), and multilabel synthetic oversampling approach based on the local distribution of labels (MLSOL). Convolutional Neural Network (CNN)-Classifier Chain (CC)-Extreme Gradient Boosting (XGBoost) is employed as a baseline and base architecture to be applied into those 3 techniques of handling imbalanced multilabel dataset. COCOA and MLSMOTE are the best performers. COCOA achieved F1-Macro of 0.9272, F1 macro MLSMOTE is 0.9276 and F1-Macro baseline is 0.9261. The best performer of COCOA is configured using 4 parameters: binary relevance mode is smote-oversampling, multiclass mode is smote-oversampling, random state =10, and binary relevance ratio =0.5. The best performer of MLSMOTE is configured using 2 parameters: number of neighbors =5, and random state =42.
引用
收藏
页码:165 / 170
页数:6
相关论文
共 50 条
  • [1] Label correlation guided borderline oversampling for imbalanced multi-label data learning
    Zhang, Kai
    Mao, Zhaoyang
    Cao, Peng
    Liang, Wei
    Yang, Jinzhu
    Li, Weiping
    Zaiane, Osmar R.
    KNOWLEDGE-BASED SYSTEMS, 2023, 279
  • [2] Handling Imbalanced Dataset in Multi-label Text Categorization using Bagging and Adaptive Boosting
    Winata, Genta Indra
    Khodra, Masayu Leylia
    5TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS 2015, 2015, : 500 - 505
  • [3] MLAWSMOTE: Oversampling in Imbalanced Multi-label Classification with Missing Labels by Learning Label Correlation Matrix
    Mao, Jian
    Huang, Kai
    Liu, Jinming
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2024, 17 (01)
  • [4] An Imbalanced Multi-Label Data Ensemble Learning Method Based on Safe Under-Sampling
    Sun, Zhong-Bin
    Diao, Yu-Xuan
    Ma, Su-Yang
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2024, 52 (10): : 3392 - 3408
  • [5] Multi-label Ensemble Learning
    Shi, Chuan
    Kong, Xiangnan
    Yu, Philip S.
    Wang, Bai
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT III, 2011, 6913 : 223 - 239
  • [6] Dual Approach to Handling Imbalanced Class in Datasets Using Oversampling and Ensemble Learning Techniques
    Pristyanto, Yoga
    Nugraha, Anggit Ferdita
    Pratama, Irfan
    Dahlan, Akhmad
    Wirasakti, Lucky Adhikrisna
    PROCEEDINGS OF THE 2021 15TH INTERNATIONAL CONFERENCE ON UBIQUITOUS INFORMATION MANAGEMENT AND COMMUNICATION (IMCOM 2021), 2021,
  • [7] Handling Class Imbalanced Data in Sarcasm Detection with Ensemble Oversampling Techniques
    Hu, Ya-Han
    Liu, Ting-Hsuan
    Tsai, Chih-Fong
    Lin, Yu-Jung
    APPLIED ARTIFICIAL INTELLIGENCE, 2025, 39 (01)
  • [8] Pseudo Labels for Imbalanced Multi-Label Learning
    Zeng, Wenrong
    Chen, Xuewen
    Cheng, Hong
    2014 INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2014, : 25 - 31
  • [9] Imbalanced and missing multi-label data learning with global and local structure
    Su, Xinpei
    Xu, Yitian
    INFORMATION SCIENCES, 2024, 677
  • [10] A Multi-label Multimodal Deep Learning Framework for Imbalanced Data Classification
    Pouyanfar, Samira
    Wang, Tianyi
    Chen, Shu-Ching
    2019 2ND IEEE CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2019), 2019, : 199 - 204