Comparing Sampling Strategies for Tackling Imbalanced Data in Human Activity Recognition

被引:15
|
作者
Alharbi, Fayez [1 ,2 ]
Ouarbya, Lahcen [2 ]
Ward, Jamie A. [2 ]
机构
[1] Majmaah Univ, Comp Sci & Informat Technol Coll, Al Majmaah 15341, Saudi Arabia
[2] Univ London, Dept Comp, London SE14 6NW, England
关键词
activity recognition; wearable sensors; imbalanced activities; sampling methods; SMOTE;
D O I
10.3390/s22041373
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Human activity recognition (HAR) using wearable sensors is an increasingly active research topic in machine learning, aided in part by the ready availability of detailed motion capture data from smartphones, fitness trackers, and smartwatches. The goal of HAR is to use such devices to assist users in their daily lives in application areas such as healthcare, physical therapy, and fitness. One of the main challenges for HAR, particularly when using supervised learning methods, is obtaining balanced data for algorithm optimisation and testing. As people perform some activities more than others (e.g., walk more than run), HAR datasets are typically imbalanced. The lack of dataset representation from minority classes hinders the ability of HAR classifiers to sufficiently capture new instances of those activities. We introduce three novel hybrid sampling strategies to generate more diverse synthetic samples to overcome the class imbalance problem. The first strategy, which we call the distance-based method (DBM), combines Synthetic Minority Oversampling Techniques (SMOTE) with Random_SMOTE, both of which are built around the k-nearest neighbors (KNN). The second technique, referred to as the noise detection-based method (NDBM), combines SMOTE Tomek links (SMOTE_Tomeklinks) and the modified synthetic minority oversampling technique (MSMOTE). The third approach, which we call the cluster-based method (CBM), combines Cluster-Based Synthetic Oversampling (CBSO) and Proximity Weighted Synthetic Oversampling Technique (ProWSyn). We compare the performance of the proposed hybrid methods to the individual constituent methods and baseline using accelerometer data from three commonly used benchmark datasets. We show that DBM, NDBM, and CBM reduce the impact of class imbalance and enhance F1 scores by a range of 9-20 percentage point compared to their constituent sampling methods. CBM performs significantly better than the others under a Friedman test, however, DBM has lower computational requirements.
引用
收藏
页数:20
相关论文
共 50 条
  • [21] Cross-Concatenation: Tackling Uncertainty in Imbalanced Big Data Classification
    Mansourifar, Hadi
    Shi, Weidong
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 867 - 875
  • [22] Classifying Imbalanced Multi-modal Sensor Data for Human Activity Recognition in a Smart Home using Deep Learning
    Alani, Ali A.
    Cosma, Georgina
    Taherkhani, Aboozar
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [23] Data Sampling Strategies for Click Fraud Detection Using Imbalanced User Click Data of Online Advertising: An Empirical Review
    Sisodia, Deepti
    Sisodia, Dilip Singh
    [J]. IETE TECHNICAL REVIEW, 2022, 39 (04) : 789 - 798
  • [24] A Hybrid Sampling SVM Approach to Imbalanced Data Classification
    Wang, Qiang
    [J]. ABSTRACT AND APPLIED ANALYSIS, 2014,
  • [25] Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    Napolitano, Amri
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2011, 41 (03): : 552 - 568
  • [26] Exploring Data Sampling Techniques for Imbalanced Classification Problems
    Sui, Yu
    Zhang, Xiaohui
    Huan, Jiajia
    Hong, Haifeng
    [J]. FOURTH INTERNATIONAL WORKSHOP ON PATTERN RECOGNITION, 2019, 11198
  • [27] Cluster-based sampling of multiclass imbalanced data
    Prachuabsupakij, Wanthanee
    Soonthornphisaj, Nuanwan
    [J]. INTELLIGENT DATA ANALYSIS, 2014, 18 (06) : 1109 - 1135
  • [28] Over-sampling algorithm for imbalanced data classification
    Xu Xiaolong
    Chen Wen
    Sun Yanfei
    [J]. JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2019, 30 (06) : 1182 - 1191
  • [29] CLUS: A New Hybrid Sampling Classification for Imbalanced Data
    Prachuabsupakij, Wanthanee
    [J]. PROCEEDINGS OF THE 2015 12TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE), 2015, : 281 - 286
  • [30] Model-Based Synthetic Sampling for Imbalanced Data
    Liu, Chien-Liang
    Hsieh, Po-Yen
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (08) : 1543 - 1556