Imbalanced Data Handling in Multi-label Aspect Categorization using Oversampling and Ensemble Learning

被引:0
|
作者
Alnatara, Wildan Dicky [1 ]
Khodra, Masayu Leylia [1 ]
机构
[1] Inst Teknol Bandung, Sch Elect Engn & Informat, Bandung, Indonesia
关键词
aspect categorization; imbalanced multilabel data; Cross-Coupling Aggregation; Multilabel Synthetic Minority Over-sampling Technique; Multilabel Synthetic Oversampling approach based on the Local distribution of labels;
D O I
10.1109/icacsis51025.2020.9263087
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In sentiment analysis, aspect based sentiment analysis (ABSA) provides detailed information of user sentiment for a product rather than document level and sentence level. Aspect categorization is one of ABSA tasks, which focuses on categorizing which aspects are related to a review text. This task working on multilabel data that usually have uneven distribution of aspect occurrences or imbalanced data condition. This paper uses 9284 data from user review text in the hotel domain. We employ 3 techniques to address imbalanced multilabel data, namely cross-coupling aggregation (COCOA), multilabel synthetic minority oversampling technique (MLSMOTE), and multilabel synthetic oversampling approach based on the local distribution of labels (MLSOL). Convolutional Neural Network (CNN)-Classifier Chain (CC)-Extreme Gradient Boosting (XGBoost) is employed as a baseline and base architecture to be applied into those 3 techniques of handling imbalanced multilabel dataset. COCOA and MLSMOTE are the best performers. COCOA achieved F1-Macro of 0.9272, F1 macro MLSMOTE is 0.9276 and F1-Macro baseline is 0.9261. The best performer of COCOA is configured using 4 parameters: binary relevance mode is smote-oversampling, multiclass mode is smote-oversampling, random state =10, and binary relevance ratio =0.5. The best performer of MLSMOTE is configured using 2 parameters: number of neighbors =5, and random state =42.
引用
收藏
页码:165 / 170
页数:6
相关论文
共 50 条
  • [31] Multi-label incremental learning applied to web page categorization
    Ciarelli, Patrick Marques
    Oliveira, Elias
    Salles, Evandro O. T.
    NEURAL COMPUTING & APPLICATIONS, 2014, 24 (06): : 1403 - 1419
  • [32] Weak Learning Algorithm for multi-label multiclass text categorization
    Xu, YY
    Zhou, XZ
    Guo, ZW
    2002 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-4, PROCEEDINGS, 2002, : 890 - 894
  • [33] Multi-label incremental learning applied to web page categorization
    Patrick Marques Ciarelli
    Elias Oliveira
    Evandro O. T. Salles
    Neural Computing and Applications, 2014, 24 : 1403 - 1419
  • [34] Ensemble Application of Convolutional and Recurrent Neural Networks for Multi-label Text Categorization
    Chen, Guibin
    Ye, Deheng
    Xing, Zhenchang
    Chen, Jieshan
    Cambria, Erik
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 2377 - 2383
  • [35] Weighted Ensemble Classification of Multi-label Data Streams
    Wang, Lulu
    Shen, Hong
    Tian, Hui
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2017, PT II, 2017, 10235 : 551 - 562
  • [36] Multi-label Selective Ensemble
    Li, Nan
    Jiang, Yuan
    Zhou, Zhi-Hua
    MULTIPLE CLASSIFIER SYSTEMS (MCS 2015), 2015, 9132 : 76 - 88
  • [37] Minimizing Supervision in Multi-label Categorization
    Rajat
    Varshney, Munender
    Singh, Pravendra
    Namboodiri, Vinay P.
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 93 - 102
  • [38] ECC plus plus : An algorithm family based on ensemble of classifier chains for classifying imbalanced multi-label data
    Duan, Jicong
    Gu, Yan
    Yu, Hualong
    Yang, Xibei
    Gao, Shang
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 236
  • [39] MLCE: A Multi-Label Crotch Ensemble Method for Multi-Label Classification
    Yao, Yuan
    Li, Yan
    Ye, Yunming
    Li, Xutao
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (04)
  • [40] Learning sample representativeness for class-imbalanced multi-label classification
    Zhang, Yu
    Cao, Sichen
    Mi, Siya
    Bian, Yali
    PATTERN ANALYSIS AND APPLICATIONS, 2024, 27 (02)