Machine Learning with Imbalanced EEG Datasets using Outlier-based Sampling

被引:0
|
作者
Islah, Nizar [1 ]
Koerner, Jamie [1 ]
Genov, Roman [1 ]
Valiante, Taufik A. [1 ,2 ]
O'Leary, Gerard [1 ]
机构
[1] Univ Toronto, Dept Elect & Comp Engn, Toronto, ON M5S 2E4, Canada
[2] Univ Toronto, Dept Surg Neurosurg, Toronto, ON M5T 2S8, Canada
关键词
D O I
暂无
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Epilepsy is a neurological disorder which causes seizures in over 65 million people worldwide. Recently developed implantable therapeutic devices aim to prevent symptoms by applying acute electrical stimulation to the seizure-generating brain region in response to activity detected by on-device machine learning hardware. Many training algorithms require an equal number of examples for each target class (e.g. normal activity and seizures), and performance can suffer if this condition is not satisfied. In the case of epilepsy, poor performance can cause seizures to be missed, or stimulation to be applied erroneously. As there is an abundance of normal (interictal) data in clinical EEG recordings, but seizures are rare events (less than 1% of the dataset), the data available for training is severely imbalanced. There are several conventional pre-processing methods used to address imbalanced class learning, such as down-sampling of the majority class and up-sampling of the minority class, but each have performance drawbacks. This paper presents an improved method which involves reducing the majority class down to the most effective interictal outlier samples. Outliers are determined by using Exponentially Decaying Memory Signal Energy (EDMSE) features with Isolation Forests and an ANOVA-based method, which involves comparing a moving feature window to a baseline reference window. Outlier-based sampling is tested with two classifiers (KNN and Logistic Regression) and achieves higher accuracy (similar to 2% increase) and fewer false positives (similar to 38% decrease), along with a lower latency (similar to 3 seconds shorter) compared to conventional training set pre-processing methods.
引用
下载
收藏
页码:112 / 115
页数:4
相关论文
共 50 条
  • [41] Data oversampling and imbalanced datasets: an investigation of performance for machine learning and feature engineering
    Mujahid, Muhammad
    Kina, Erol
    Rustam, Furqan
    Villar, Monica Gracia
    Alvarado, Eduardo Silva
    Diez, Isabel De La Torre
    Ashraf, Imran
    JOURNAL OF BIG DATA, 2024, 11 (01)
  • [42] A Domain-Specific Tool for the Creation of Machine Learning Models with Imbalanced Datasets
    Fomin, Dmitrii
    Lalanda, Philippe
    Morand, Denis
    2024 IEEE INTERNATIONAL CONFERENCE ON SMART COMPUTING, SMARTCOMP 2024, 2024, : 206 - 213
  • [43] Diagnosis of Breast Cancer on Imbalanced Dataset Using Various Sampling Techniques and Machine Learning Models
    Gupta, Ruchita
    Bhargava, Rupal
    Jayabalan, Manoj
    2021 14TH INTERNATIONAL CONFERENCE ON DEVELOPMENTS IN ESYSTEMS ENGINEERING (DESE), 2021, : 162 - 167
  • [44] Classification of Imbalanced Datasets using Partition Method and Support Vector Machine
    Awasare, Vinod Kumar
    Gupta, Surendra
    PROCEEDINGS OF THE 2017 IEEE SECOND INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES (ICECCT), 2017,
  • [45] Experimental Comparison of Sampling Techniques for Imbalanced Datasets Using Various Classification Models
    Pattanayak, Sanjibani Sudha
    Rout, Minakhi
    PROGRESS IN ADVANCED COMPUTING AND INTELLIGENT ENGINEERING, VOL 2, 2018, 564 : 13 - 22
  • [46] Entropy-based fuzzy support vector machine for imbalanced datasets
    Fan, Qi
    Wang, Zhe
    Li, Dongdong
    Gao, Daqi
    Zha, Hongyuan
    KNOWLEDGE-BASED SYSTEMS, 2017, 115 : 87 - 99
  • [47] A Hybrid Sampling Method Based on Safe Screening for Imbalanced Datasets with Sparse Structure
    Shi, Hongbo
    Gao, Qigang
    Ji, Suqin
    Liu, Yanxin
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [48] Comparison Of The Different Sampling Techniques For Imbalanced Classification Problems In Machine Learning
    Peng Zhihao
    Yan Fenglong
    Li Xucheng
    2019 11TH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION (ICMTMA 2019), 2019, : 431 - 434
  • [49] A synthetic informative minority over-sampling (SIMO) algorithm leveraging support vector machine to enhance learning from imbalanced datasets
    Piri, Saeed
    Delen, Dursun
    Liu, Tieming
    DECISION SUPPORT SYSTEMS, 2018, 106 : 15 - 29
  • [50] A Novel Evolutionary Preprocessing Method Based on Over-sampling and Under-sampling for Imbalanced Datasets
    Wong, Ginny Y.
    Leung, Frank H. F.
    Ling, Sai-Ho
    39TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY (IECON 2013), 2013, : 2354 - 2359