Machine Learning with Imbalanced EEG Datasets using Outlier-based Sampling

被引:0
|
作者
Islah, Nizar [1 ]
Koerner, Jamie [1 ]
Genov, Roman [1 ]
Valiante, Taufik A. [1 ,2 ]
O'Leary, Gerard [1 ]
机构
[1] Univ Toronto, Dept Elect & Comp Engn, Toronto, ON M5S 2E4, Canada
[2] Univ Toronto, Dept Surg Neurosurg, Toronto, ON M5T 2S8, Canada
关键词
D O I
暂无
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Epilepsy is a neurological disorder which causes seizures in over 65 million people worldwide. Recently developed implantable therapeutic devices aim to prevent symptoms by applying acute electrical stimulation to the seizure-generating brain region in response to activity detected by on-device machine learning hardware. Many training algorithms require an equal number of examples for each target class (e.g. normal activity and seizures), and performance can suffer if this condition is not satisfied. In the case of epilepsy, poor performance can cause seizures to be missed, or stimulation to be applied erroneously. As there is an abundance of normal (interictal) data in clinical EEG recordings, but seizures are rare events (less than 1% of the dataset), the data available for training is severely imbalanced. There are several conventional pre-processing methods used to address imbalanced class learning, such as down-sampling of the majority class and up-sampling of the minority class, but each have performance drawbacks. This paper presents an improved method which involves reducing the majority class down to the most effective interictal outlier samples. Outliers are determined by using Exponentially Decaying Memory Signal Energy (EDMSE) features with Isolation Forests and an ANOVA-based method, which involves comparing a moving feature window to a baseline reference window. Outlier-based sampling is tested with two classifiers (KNN and Logistic Regression) and achieves higher accuracy (similar to 2% increase) and fewer false positives (similar to 38% decrease), along with a lower latency (similar to 3 seconds shorter) compared to conventional training set pre-processing methods.
引用
下载
收藏
页码:112 / 115
页数:4
相关论文
共 50 条
  • [21] Over-sampling imbalanced datasets using the covariance matrix
    Leguen-de Varona, Ireimis
    Madera, Julio
    Martínez-López, Yoan
    Hernández-Nieto, José Carlos
    EAI Endorsed Transactions on Energy Web, 2020, 7 (27) : 1 - 6
  • [22] Classification for Imbalanced and Overlapping Classes Using Outlier Detection and Sampling Techniques
    Yang, Zeping
    Gao, Daqi
    APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 : 375 - 381
  • [23] Sampling-Based Machine Learning Models for Intrusion Detection in Imbalanced Dataset
    Fan, Zongwen
    Sohail, Shaleeza
    Sabrina, Fariza
    Gu, Xin
    ELECTRONICS, 2024, 13 (10)
  • [24] Bi-SMOTE: a novel framework for handling imbalanced datasets using machine learning techniques
    Onima Tigga
    Jaya Pal
    Debjani Mustafi
    International Journal of Information Technology, 2025, 17 (1) : 431 - 445
  • [25] Imbalanced-learn: A Python']Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning
    Lemaitre, Guillaume
    Nogueira, Fernando
    Aridas, Christos K.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18
  • [26] A Study on Machine Learning for Imbalanced Datasets with Answer Validation of Question Answering
    Day, Min-Yuh
    Tsai, Cheng-Chia
    PROCEEDINGS OF 2016 IEEE 17TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IEEE IRI), 2016, : 513 - 519
  • [27] Recognition of Imbalanced Epileptic EEG Signals by a Graph-Based Extreme Learning Machine
    Zhou, Jie
    Zhang, Xiongtao
    Jiang, Zhibin
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2021, 2021
  • [28] An algorithm of robust online extreme learning machine for dynamic imbalanced datasets
    Zhang, Jing
    Feng, Lin
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2015, 52 (07): : 1487 - 1498
  • [29] Machine Learning for Imbalanced Datasets of Recognizing Inference in Text with Linguistic Phenomena
    Day, Min-Yuh
    Tsai, Cheng-Chia
    2015 IEEE 16TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2015, : 562 - 568
  • [30] It is time for some deep learning: a statistical commentary on machine learning for clinical prediction models using imbalanced datasets
    Stonko, David
    Jarman, Molly P.
    Byrne, James P.
    TRAUMA SURGERY & ACUTE CARE OPEN, 2024, 9 (01)