Exploiting Domain Knowledge to Address Class Imbalance in Meteorological Data Mining

被引:1
|
作者
Tsagalidis, Evangelos [1 ]
Evangelidis, Georgios [2 ]
机构
[1] Hellen Agr Insurance Org, Meteorol Applicat Ctr, Int Airport Makedonia, Thessaloniki 55103, Greece
[2] Univ Macedonia, Sch Informat Sci, Dept Appl Informat, Thessaloniki 54636, Greece
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 23期
关键词
meteorological data mining and machine learning; class imbalance; classification; randomized undersampling; SMOTE oversampling; undersampling using temporal distances;
D O I
10.3390/app122312402
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
We deal with the problem of class imbalance in data mining and machine learning classification algorithms. This is the case where some of the class labels are represented by a small number of examples in the training dataset compared to the rest of the class labels. Usually, those minority class labels are the most important ones, implying that classifiers should primarily perform well on predicting those labels. This is a well-studied problem and various strategies that use sampling methods are used to balance the representation of the labels in the training dataset and improve classifier performance. We explore whether expert knowledge in the field of Meteorology can enhance the quality of the training dataset when treated by pre-processing sampling strategies. We propose four new sampling strategies based on our expertise on the data domain and we compare their effectiveness against the established sampling strategies used in the literature. It turns out that our sampling strategies, which take advantage of expert knowledge from the data domain, achieve class balancing that improves the performance of most classifiers.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Exploiting Domain Knowledge for Object Discovery
    Collet, Alvaro
    Xiong, Bo
    Gurau, Corina
    Hebert, Martial
    Srinivasa, Siddhartha S.
    2013 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2013, : 2118 - 2125
  • [32] Exploiting domain knowledge for approximate diagnosis
    ten Teije, A
    van Harmelen, F
    IJCAI-97 - PROCEEDINGS OF THE FIFTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 AND 2, 1997, : 454 - 459
  • [33] Exploiting Qualitative Domain Knowledge for Learning Bayesian Network Parameters with Incomplete Data
    Liao, Wenhui
    Ji, Qiang
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 543 - 546
  • [34] Exploiting Domain Knowledge and Public Linked Data to Extract Opinions from Reviews
    Alfrjani, Rowida
    Osman, Taha
    Cosma, Georgina
    PROCEEDINGS OF 2017 2ND INTERNATIONAL CONFERENCE ON KNOWLEDGE ENGINEERING AND APPLICATIONS (ICKEA), 2017, : 98 - 102
  • [35] A weighted rough set method to address the class imbalance problem
    Liu, Jin-Fu
    Yu, Da-Ren
    PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 3693 - 3698
  • [36] Survey of Fuzzy based techniques to address Class Imbalance Problem
    Kaur, Prahhjot
    Gupta, Anshul
    PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 2602 - 2604
  • [37] Dual Focal Loss to address class imbalance in semantic segmentation
    Hossain, Md Sazzad
    Betts, John M.
    Paplinski, Andrew P.
    NEUROCOMPUTING, 2021, 462 : 69 - 87
  • [38] VISAL-A novel learning strategy to address class imbalance
    Vamsidhar, S. Sree Rama
    Sivapuram, Arun Kumar
    Ravi, Vaishnavi
    Senthil, Gowtham
    Gorthi, Rama Krishna
    NEURAL NETWORKS, 2023, 161 : 178 - 184
  • [39] Dual Focal Loss to address class imbalance in semantic segmentation
    Hossain, Md Sazzad
    Betts, John M.
    Paplinski, Andrew P.
    Neurocomputing, 2021, 462 : 69 - 87
  • [40] Incorporating domain knowledge into data mining process: An ontology based framework
    Pan, Ding
    Shen, Jun-Yi
    Zhou, Mu-Xin
    Wuhan University Journal of Natural Sciences, 2006, 11 (01) : 165 - 169