Machine Learning with Imbalanced EEG Datasets using Outlier-based Sampling

被引:0
|
作者
Islah, Nizar [1 ]
Koerner, Jamie [1 ]
Genov, Roman [1 ]
Valiante, Taufik A. [1 ,2 ]
O'Leary, Gerard [1 ]
机构
[1] Univ Toronto, Dept Elect & Comp Engn, Toronto, ON M5S 2E4, Canada
[2] Univ Toronto, Dept Surg Neurosurg, Toronto, ON M5T 2S8, Canada
关键词
D O I
暂无
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Epilepsy is a neurological disorder which causes seizures in over 65 million people worldwide. Recently developed implantable therapeutic devices aim to prevent symptoms by applying acute electrical stimulation to the seizure-generating brain region in response to activity detected by on-device machine learning hardware. Many training algorithms require an equal number of examples for each target class (e.g. normal activity and seizures), and performance can suffer if this condition is not satisfied. In the case of epilepsy, poor performance can cause seizures to be missed, or stimulation to be applied erroneously. As there is an abundance of normal (interictal) data in clinical EEG recordings, but seizures are rare events (less than 1% of the dataset), the data available for training is severely imbalanced. There are several conventional pre-processing methods used to address imbalanced class learning, such as down-sampling of the majority class and up-sampling of the minority class, but each have performance drawbacks. This paper presents an improved method which involves reducing the majority class down to the most effective interictal outlier samples. Outliers are determined by using Exponentially Decaying Memory Signal Energy (EDMSE) features with Isolation Forests and an ANOVA-based method, which involves comparing a moving feature window to a baseline reference window. Outlier-based sampling is tested with two classifiers (KNN and Logistic Regression) and achieves higher accuracy (similar to 2% increase) and fewer false positives (similar to 38% decrease), along with a lower latency (similar to 3 seconds shorter) compared to conventional training set pre-processing methods.
引用
收藏
页码:112 / 115
页数:4
相关论文
共 50 条
  • [1] Certainty-based active learning for sampling imbalanced datasets
    Fu, JuiHsi
    Lee, SingLing
    [J]. NEUROCOMPUTING, 2013, 119 : 350 - 358
  • [2] ODBOT: Outlier detection-based oversampling technique for imbalanced datasets learning
    Ibrahim, Mohammed H.
    [J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (22): : 15781 - 15806
  • [3] ODBOT: Outlier detection-based oversampling technique for imbalanced datasets learning
    Mohammed H. IBRAHIM
    [J]. Neural Computing and Applications, 2021, 33 : 15781 - 15806
  • [4] LOFReg: An outlier-based regulariser for deep metric learning
    Kamenou, Eleni
    del Rincon, Jesus Martinez
    Miller, Paul
    Devlin-Hill, Patricia
    Budgett, Samuel
    Angelini, Federico
    Grinyer, Charlotte
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 237
  • [5] A Hybrid Machine Learning Methodology for Imbalanced Datasets
    Lipitakis, Anastasia-Dimitra
    Kotsiantis, Sotirios
    [J]. 5TH INTERNATIONAL CONFERENCE ON INFORMATION, INTELLIGENCE, SYSTEMS AND APPLICATIONS, IISA 2014, 2014, : 252 - +
  • [6] Effect of Imbalanced Datasets on Security of Industrial IoT Using Machine Learning
    Zolanvari, Maede
    Teixeira, Marcio A.
    Jain, Raj
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS (ISI), 2018, : 112 - 117
  • [7] Universum based kernelized weighted extreme learning machine for imbalanced datasets
    Bhagat Singh Raghuwanshi
    Akansha Mangal
    Sanyam Shukla
    [J]. International Journal of Machine Learning and Cybernetics, 2022, 13 : 3387 - 3408
  • [8] Universum based kernelized weighted extreme learning machine for imbalanced datasets
    Raghuwanshi, Bhagat Singh
    Mangal, Akansha
    Shukla, Sanyam
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2022, 13 (11) : 3387 - 3408
  • [9] Fuzzy support vector machine using local outlier factor and intuitionistic fuzzy sets for imbalanced datasets
    Hu, Mengya
    Lu, Shaowu
    [J]. JOURNAL OF CONTROL AND DECISION, 2024,
  • [10] OMCOKE: A Machine Learning Outlier-based Overlapping Clustering Technique for Multi-Label Data Analysis
    Baadel S.
    Thabtah F.
    Lu J.
    Harguem S.
    [J]. Informatica (Slovenia), 2022, 46 (04): : 523 - 530