Imbalanced Data Handling in Multi-label Aspect Categorization using Oversampling and Ensemble Learning

被引:0
|
作者
Alnatara, Wildan Dicky [1 ]
Khodra, Masayu Leylia [1 ]
机构
[1] Inst Teknol Bandung, Sch Elect Engn & Informat, Bandung, Indonesia
关键词
aspect categorization; imbalanced multilabel data; Cross-Coupling Aggregation; Multilabel Synthetic Minority Over-sampling Technique; Multilabel Synthetic Oversampling approach based on the Local distribution of labels;
D O I
10.1109/icacsis51025.2020.9263087
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In sentiment analysis, aspect based sentiment analysis (ABSA) provides detailed information of user sentiment for a product rather than document level and sentence level. Aspect categorization is one of ABSA tasks, which focuses on categorizing which aspects are related to a review text. This task working on multilabel data that usually have uneven distribution of aspect occurrences or imbalanced data condition. This paper uses 9284 data from user review text in the hotel domain. We employ 3 techniques to address imbalanced multilabel data, namely cross-coupling aggregation (COCOA), multilabel synthetic minority oversampling technique (MLSMOTE), and multilabel synthetic oversampling approach based on the local distribution of labels (MLSOL). Convolutional Neural Network (CNN)-Classifier Chain (CC)-Extreme Gradient Boosting (XGBoost) is employed as a baseline and base architecture to be applied into those 3 techniques of handling imbalanced multilabel dataset. COCOA and MLSMOTE are the best performers. COCOA achieved F1-Macro of 0.9272, F1 macro MLSMOTE is 0.9276 and F1-Macro baseline is 0.9261. The best performer of COCOA is configured using 4 parameters: binary relevance mode is smote-oversampling, multiclass mode is smote-oversampling, random state =10, and binary relevance ratio =0.5. The best performer of MLSMOTE is configured using 2 parameters: number of neighbors =5, and random state =42.
引用
收藏
页码:165 / 170
页数:6
相关论文
共 50 条
  • [41] Novel Oversampling Algorithm for Handling Imbalanced Data Classification Novel Oversampling Algorithm
    More, Anjali S.
    Rana, Dipti P.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (08) : 491 - 496
  • [42] Predicting drug side effects by multi-label learning and ensemble learning
    Zhang, Wen
    Liu, Feng
    Luo, Longqiang
    Zhang, Jingxia
    BMC BIOINFORMATICS, 2015, 16
  • [43] Predicting drug side effects by multi-label learning and ensemble learning
    Wen Zhang
    Feng Liu
    Longqiang Luo
    Jingxia Zhang
    BMC Bioinformatics, 16
  • [44] Imbalanced multi-label learning for identifying antimicrobial peptides and their functional types
    Lin, Weizhong
    Xu, Dong
    BIOINFORMATICS, 2016, 32 (24) : 3745 - 3752
  • [45] Deep learning model for imbalanced multi-label surface defect classification
    Liu, Yang
    Yuan, Yachao
    Liu, Jing
    MEASUREMENT SCIENCE AND TECHNOLOGY, 2022, 33 (03)
  • [46] Multi-label Categorization of Accounts of Sexism using a Neural Framework
    Parikh, Pulkit
    Abburi, Harika
    Badjatiya, Pinkesh
    Krishnan, Radhika
    Chhaya, Niyati
    Gupta, Manish
    Varma, Vasudeva
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1642 - 1652
  • [47] Multi-Label Object Categorization Using Histograms of Global Relations
    Mustafa, Wail
    Xiong, Hanchen
    Kraft, Dirk
    Szedmak, Sandor
    Piater, Justus
    Kruger, Norbert
    2015 INTERNATIONAL CONFERENCE ON 3D VISION, 2015, : 309 - 317
  • [48] EnML: Multi-label Ensemble Learning for Urdu Text Classification
    Mehmood, Faiza
    Shahzadi, Rehab
    Ghafoor, Hina
    Asim, Muhammad Nabeel
    Ghani, Muhammad Usman
    Mahmood, Waqar
    Dengel, Andreas
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (09)
  • [49] Large-scale multi-label ensemble learning on Spark
    Gonzalez-Lopez, Jorge
    Cano, Alberto
    Ventura, Sebastian
    2017 16TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS / 11TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING / 14TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS, 2017, : 893 - 900
  • [50] A Self-Ensemble Approach for Partial Multi-Label Learning
    Yan, Yan
    Li, Shining
    IEEE ACCESS, 2020, 8 : 52996 - 53005