Comparing Sampling Strategies for Tackling Imbalanced Data in Human Activity Recognition

被引:15
|
作者
Alharbi, Fayez [1 ,2 ]
Ouarbya, Lahcen [2 ]
Ward, Jamie A. [2 ]
机构
[1] Majmaah Univ, Comp Sci & Informat Technol Coll, Al Majmaah 15341, Saudi Arabia
[2] Univ London, Dept Comp, London SE14 6NW, England
关键词
activity recognition; wearable sensors; imbalanced activities; sampling methods; SMOTE;
D O I
10.3390/s22041373
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Human activity recognition (HAR) using wearable sensors is an increasingly active research topic in machine learning, aided in part by the ready availability of detailed motion capture data from smartphones, fitness trackers, and smartwatches. The goal of HAR is to use such devices to assist users in their daily lives in application areas such as healthcare, physical therapy, and fitness. One of the main challenges for HAR, particularly when using supervised learning methods, is obtaining balanced data for algorithm optimisation and testing. As people perform some activities more than others (e.g., walk more than run), HAR datasets are typically imbalanced. The lack of dataset representation from minority classes hinders the ability of HAR classifiers to sufficiently capture new instances of those activities. We introduce three novel hybrid sampling strategies to generate more diverse synthetic samples to overcome the class imbalance problem. The first strategy, which we call the distance-based method (DBM), combines Synthetic Minority Oversampling Techniques (SMOTE) with Random_SMOTE, both of which are built around the k-nearest neighbors (KNN). The second technique, referred to as the noise detection-based method (NDBM), combines SMOTE Tomek links (SMOTE_Tomeklinks) and the modified synthetic minority oversampling technique (MSMOTE). The third approach, which we call the cluster-based method (CBM), combines Cluster-Based Synthetic Oversampling (CBSO) and Proximity Weighted Synthetic Oversampling Technique (ProWSyn). We compare the performance of the proposed hybrid methods to the individual constituent methods and baseline using accelerometer data from three commonly used benchmark datasets. We show that DBM, NDBM, and CBM reduce the impact of class imbalance and enhance F1 scores by a range of 9-20 percentage point compared to their constituent sampling methods. CBM performs significantly better than the others under a Friedman test, however, DBM has lower computational requirements.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] A Comparison of Sampling Methods for Dealing with Imbalanced Wearable Sensor Data in Human Activity Recognition using Deep Learning
    El Ghazi, Mariam
    Aknin, Noura
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (10) : 290 - 305
  • [2] Joint Learning of Temporal Models to Handle Imbalanced Data for Human Activity Recognition
    Hamad, Rebeen Ali
    Yang, Longzhi
    Woo, Wai Lok
    Wei, Bo
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (15):
  • [3] Dealing with Imbalanced Data Sets for Human Activity Recognition Using Mobile Phone Sensors
    Ky Trung Nguyen
    Portet, Francois
    Garbay, Catherine
    [J]. INTELLIGENT ENVIRONMENTS 2018, 2018, 23 : 129 - 138
  • [4] Hybrid sampling for imbalanced data
    Seiffert, Chris
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    [J]. PROCEEDINGS OF THE 2008 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2008, : 202 - 207
  • [5] Hybrid sampling for imbalanced data
    Seiffert, Chris
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    [J]. INTEGRATED COMPUTER-AIDED ENGINEERING, 2009, 16 (03) : 193 - 210
  • [6] IMPORTANCE-WEIGHTED THE IMBALANCED DATA FOR C-SVM CLASSIFIER TO HUMAN ACTIVITY RECOGNITION
    Abidine, M'hamed Bilal
    Fergani, Belkacem
    Clavier, Laurent
    [J]. 2013 8TH INTERNATIONAL WORKSHOP ON SYSTEMS, SIGNAL PROCESSING AND THEIR APPLICATIONS (WOSSPA), 2013, : 330 - 335
  • [7] Neighbourhood sampling in bagging for imbalanced data
    Blaszczynski, Jerzy
    Stefanowski, Jerzy
    [J]. NEUROCOMPUTING, 2015, 150 : 529 - 542
  • [8] Improvised methods for tackling big data stream mining challenges: case study of human activity recognition
    Fong, Simon
    Liu, Kexing
    Cho, Kyungeun
    Wong, Raymond
    Mohammed, Sabah
    Fiaidhi, Jinan
    [J]. JOURNAL OF SUPERCOMPUTING, 2016, 72 (10): : 3927 - 3959
  • [9] Improvised methods for tackling big data stream mining challenges: case study of human activity recognition
    Simon Fong
    Kexing Liu
    Kyungeun Cho
    Raymond Wong
    Sabah Mohammed
    Jinan Fiaidhi
    [J]. The Journal of Supercomputing, 2016, 72 : 3927 - 3959
  • [10] A Hybrid Sampling Method for Imbalanced Data
    Gazzah, Sami
    Hechkel, Amina
    Ben Amara, Najoua Essoukri
    [J]. 2015 IEEE 12TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2015,