K Nearest Neighbor OveRsampling approach: An open source python']python package for data augmentation

被引:2
|
作者
Islam, Ashhadul [1 ]
Belhaouari, Samir Brahim [1 ]
Rehman, Atiq Ur [1 ]
Bensmail, Halima [2 ]
机构
[1] Hamad Bin Khalifa Univ, Div Informat & Comp Technol, Ar Rayyan, Qatar
[2] Qatar Comp Res Inst, Ar Rayyan, Qatar
关键词
Data augmentation; Machine learning; Imbalanced data; Nearest neighbor; SMOTE;
D O I
10.1016/j.simpa.2022.100272
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Data is present in abundance, but the problem of imbalanced dataset crops up time and again, vexing classifiers and reducing accuracy. This paper introduces K Nearest Neighbor OveRsampling (KNNOR) Algorithm - a novel data augmentation technique that considers the distribution of data and takes into account the k nearest neighbors while generating artificial data points. The KNNOR algorithm has outperformed the state-of-the-art augmentation algorithms by enabling classifiers to achieve much higher accuracy after injecting artificial minority datapoints into imbalanced datasets. This method is useful especially in health datasets where an imbalance is common and can even be applied to images of lower dimensions.
引用
收藏
页数:3
相关论文
共 50 条
  • [21] Nmrglue: an open source Python package for the analysis of multidimensional NMR data
    Jonathan J. Helmus
    Christopher P. Jaroniec
    [J]. Journal of Biomolecular NMR, 2013, 55 : 355 - 367
  • [22] MaD GUI: An Open-Source Python']Python Package for Annotation and Analysis of Time-Series Data
    Ollenschlaeger, Malte
    Kuderle, Arne
    Mehringer, Wolfgang
    Seifer, Ann-Kristin
    Winkler, Juergen
    Gassner, Heiko
    Kluge, Felix
    Eskofier, Bjoern M.
    [J]. SENSORS, 2022, 22 (15)
  • [23] Gnssrefl: an open source software package in python']python for GNSS interferometric reflectometry applications
    Larson, Kristine M.
    [J]. GPS SOLUTIONS, 2024, 28 (04)
  • [24] pyResearchInsights-An open-source Python']Python package for scientific text analysis
    Shetty, Sarthak J.
    Ramesh, Vijay
    [J]. ECOLOGY AND EVOLUTION, 2021, 11 (20): : 13920 - 13929
  • [25] tableone: An open source Python']Python package for producing summary statistics for research papers
    Pollard, Tom J.
    Johnson, Alistair E. W.
    Raffa, Jesse D.
    Mark, Roger G.
    [J]. JAMIA OPEN, 2018, 1 (01) : 26 - 31
  • [26] GMAG: An open-source python']python package for ground-based magnetometers
    Murphy, Kyle R.
    Rae, I. Jonathan
    Halford, Alexa J.
    Engebretson, Mark
    Russell, Christopher T.
    Matzka, Jurgen
    Johnsen, Magnar G.
    Milling, David K.
    Mann, Ian R.
    Kale, Andy
    Xu, Zhonghua
    Connors, Martin
    Angelopoulos, Vassilis
    Chi, Peter
    Tanskanen, Eija
    [J]. FRONTIERS IN ASTRONOMY AND SPACE SCIENCES, 2022, 9
  • [27] PLACE: An Open-Source Python']Python Package for Laboratory Automation, Control, and Experimentation
    Johnson, Jami L.
    Woerden, Henrik Tom
    van Wijk, Kasper
    [J]. JALA, 2015, 20 (01): : 10 - 16
  • [28] AlphaMap: an open-source Python']Python package for the visual annotation of proteomics data with sequence-specific knowledge
    Voytik, Eugenia
    Bludau, Isabell
    Willems, Sander
    Hansen, Fynn M.
    Brunner, Andreas-David
    Strauss, Maximilian T.
    Mann, Matthias
    [J]. BIOINFORMATICS, 2022, 38 (03) : 849 - 852
  • [29] GDPS: an open-source python']python-based software package for multi-GNSS data preprocessing
    Lu, Liguo
    Hu, Weijian
    Wu, Tangting
    [J]. GPS SOLUTIONS, 2024, 28 (03)
  • [30] Python']Python Package abstcal: An Open-Source Tool for Calculating Abstinence From Timeline Followback Data Comment
    Cui, Yong
    Robinson, Jason D.
    Rymer, Rudel E.
    Minnix, Jennifer A.
    Cinciripini, Paul M.
    [J]. NICOTINE & TOBACCO RESEARCH, 2022, 24 (01) : 146 - 148