Gaussian distribution resampling via Chebyshev distance for food computing

被引:5
|
作者
Li, Tianle [1 ]
Zuo, Enguang [1 ]
Chen, Chen [1 ]
Chen, Cheng [2 ]
Zhong, Jie [2 ]
Yan, Junyi [2 ]
Lv, Xiaoyi [2 ]
机构
[1] Xinjiang Univ, Coll Informat Sci & Engn, Urumqi 830046, Xinjiang, Peoples R China
[2] Xinjiang Univ, Coll Software, Urumqi 830046, Xinjiang, Peoples R China
关键词
Food computing; Imbalanced learning; Gaussian distribution oversampling; Random undersampling; Chebyshev distance; SMOTE;
D O I
10.1016/j.asoc.2023.111103
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The problem of data imbalance often occurs in the real-world food domain. Traditional classification algorithms are prone to overfitting on imbalanced datasets, and the decision surface will be biased toward majority-class samples, making it difficult to identify minority-class samples. Although previous resampling techniques can deal with the imbalance problem by balancing the dataset, they may produce class overlap because the anchor samples are not appropriately selected and the generated dataset does not conform to the original distribution. This paper proposes an adaptive resampling technique based on Gaussian distribution oversampling combined with random undersampling (GDRS) to address the abovementioned problems. The technique is based on the Chebyshev distance combining the weight information of the minority-class samples to select a suitable anchor sample. A new dataset conforming to the original distribution is generated in the form of a Gaussian distribution around the anchor sample. Then the random undersampling technique is combined to reduce the possibility of overfitting. The technique is applied to five UCI datasets and compared with seven imbalanced learning methods. The experimental results demonstrate that our method GDRS yields optimal performance. We also validate the effectiveness of our method in dealing with real dairy datasets with different imbalance ratios, which is prom-ising for application in the food field.
引用
收藏
页数:12
相关论文
共 50 条