K Nearest Neighbor OveRsampling approach: An open source python']python package for data augmentation

被引:2
|
作者
Islam, Ashhadul [1 ]
Belhaouari, Samir Brahim [1 ]
Rehman, Atiq Ur [1 ]
Bensmail, Halima [2 ]
机构
[1] Hamad Bin Khalifa Univ, Div Informat & Comp Technol, Ar Rayyan, Qatar
[2] Qatar Comp Res Inst, Ar Rayyan, Qatar
关键词
Data augmentation; Machine learning; Imbalanced data; Nearest neighbor; SMOTE;
D O I
10.1016/j.simpa.2022.100272
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Data is present in abundance, but the problem of imbalanced dataset crops up time and again, vexing classifiers and reducing accuracy. This paper introduces K Nearest Neighbor OveRsampling (KNNOR) Algorithm - a novel data augmentation technique that considers the distribution of data and takes into account the k nearest neighbors while generating artificial data points. The KNNOR algorithm has outperformed the state-of-the-art augmentation algorithms by enabling classifiers to achieve much higher accuracy after injecting artificial minority datapoints into imbalanced datasets. This method is useful especially in health datasets where an imbalance is common and can even be applied to images of lower dimensions.
引用
收藏
页数:3
相关论文
共 50 条
  • [1] pyPMU - Open Source Python']Python Package for Synchrophasor Data Transfer
    Sandi, Stevan
    Krstajic, Bozo
    Popovic, Tomo
    [J]. 2016 24TH TELECOMMUNICATIONS FORUM (TELFOR), 2016, : 861 - 864
  • [2] CommonNNClustering?A Python']Python Package for Generic Common-Nearest-Neighbor Clustering
    Kapp-Joswig, Jan-Oliver
    Keller, Bettina G.
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2023, 63 (04) : 1093 - 1098
  • [3] SigMT: An open-source Python']Python package for magnetotelluric data processing
    Ajithabh, K. S.
    Patro, Prasanta K.
    [J]. COMPUTERS & GEOSCIENCES, 2023, 171
  • [4] Novel Open Source Python']Python Neutrosophic Package
    El-Ghareeb, Haitham A.
    [J]. NEUTROSOPHIC SETS AND SYSTEMS, 2019, 25 : 136 - 160
  • [5] Nmrglue: an open source Python']Python package for the analysis of multidimensional NMR data
    Helmus, Jonathan J.
    Jaroniec, Christopher P.
    [J]. JOURNAL OF BIOMOLECULAR NMR, 2013, 55 (04) : 355 - 367
  • [6] pyActigraphy: Open-source python']python package for actigraphy data visualization and analysis
    Hammad, Gregory
    Reyt, Mathilde
    Beliy, Nikita
    Baillet, Marion
    Deantoni, Michele
    Lesoinne, Alexia
    Muto, Vincenzo
    Schmidt, Christina
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2021, 17 (10)
  • [7] astroplan: An Open Source Observation Planning Package in Python']Python
    Morris, Brett M.
    Tollerud, Erik
    Sipocz, Brigitta
    Deil, Christoph
    Douglas, Stephanie T.
    Medina, Jazmin Berlanga
    Vyhmeister, Karl
    Smith, Toby R.
    Littlefair, Stuart
    Price-Whelan, Adrian M.
    Gee, Wilfred T.
    Jeschke, Eric
    [J]. ASTRONOMICAL JOURNAL, 2018, 155 (03):
  • [8] pyActigraphy, an open-source python']python package for actigraphy data visualisation and analysis
    Hammad, G.
    Reyt, M.
    Beliy, N.
    Baillet, M.
    Deantoni, M.
    Lesoinne, A.
    Muto, V.
    Schmidt, C.
    [J]. JOURNAL OF SLEEP RESEARCH, 2020, 29 : 291 - 292
  • [9] pyIDEAS: an Open Source Python']Python Package for Model Analysis
    Van Daele, Timothy
    Van Hoey, Stijn
    Nopens, Ingmar
    [J]. 12TH INTERNATIONAL SYMPOSIUM ON PROCESS SYSTEMS ENGINEERING (PSE) AND 25TH EUROPEAN SYMPOSIUM ON COMPUTER AIDED PROCESS ENGINEERING (ESCAPE), PT A, 2015, 37 : 569 - 574
  • [10] HFTools - An open source python']python package for microwave engineering
    Stenarson, J.
    [J]. 2014 83RD ARFTG MICROWAVE MEASUREMENT CONFERENCE (ARFTG): MICROWAVE MEASUREMENTS FOR EMERGING TECHNOLOGIES, 2014,