Missing information in imbalanced data stream: fuzzy adaptive imputation approach

被引:0
|
作者
Bohnishikha Halder
Md Manjur Ahmed
Toshiyuki Amagasa
Nor Ashidi Mat Isa
Rahat Hossain Faisal
Md. Mostafijur Rahman
机构
[1] University of Barishal,Department of Computer Science and Engineering
[2] University of Tsukuba,Center for Computational Sciences
[3] Universiti Sains Malaysia,School of Electrical and Electronic Engineering, Engineering Campus
[4] Daffodil International University,Department of Software Engineering
来源
Applied Intelligence | 2022年 / 52卷
关键词
Data imputation; Missing information; Fuzzy adaptive approach; Pattern recognition; Imbalanced data; Data stream;
D O I
暂无
中图分类号
学科分类号
摘要
From a real-world perspective, missing information is an ordinary scenario in data stream. Generally, missing data generate diverse problems in recognizing the pattern of data (i.e., clustering and classification). Particularly, missing data in data stream is a challenging topic. With imbalanced data, the problem of missing data greatly affects pattern recognition. As a solution to all these issues, this study puts forward an adaptive technique with fuzzy-based information decomposition method, which simultaneously solves the problem of incomplete data and overcomes the imbalanced data stream in a dataset. The main purpose of the proposed fuzzy adaptive imputation approach (FAIA) is to represent the effect of missing values in imbalance data stream and handle the missing data problem in imbalance data stream. FAIA is a single pass method. It considers adaptive selection of intervals based on all observed instances by using the interrelationship of attributes to identify correct interval for computing missing instances. Here, the interrelationship of two attributes means one attribute’s value of an instance depends on another attribute’s value of the same instance. In FAIA, after measuring all interval distances from a certain missing value, the least distance is selected for this missing value. Synthetic data of minority class are generated using the same process of missing value imputation for balancing data that is called oversampling. Instances of the datasets are divided into the chunks in data stream to balance data without any ensemble of previous chunks because missing values may misguide the future chunks. To demonstrate the performance of FAIA, the experiment is divided into three parts: missing data imputation, imbalanced information for offline data for data stream, and imbalanced information with missing value for offline data. Eleven numerical datasets with different dimensions from various repositories are considered for the computing performance of missing data imputation and imbalanced data without data stream. Four different datasets are also used to measure the performance of imbalanced data stream. In maximum measuring cases, the proposed method outperforms.
引用
收藏
页码:5561 / 5583
页数:22
相关论文
共 50 条
  • [21] Missing Categorical Data Imputation Approach Based on Similarity
    Wu, Sen
    Feng, Xiaodong
    Han, Yushan
    Wang, Qiang
    PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2012, : 2827 - 2832
  • [22] Missing Data Imputation With Baseline Information in Longitudinal Clinical Trials
    Zhang, Yilong
    Zimmer, Zachary
    Xu, Lei
    Lam, Raymond L. H.
    Huyck, Susan
    Golm, Gregory
    STATISTICS IN BIOPHARMACEUTICAL RESEARCH, 2022, 14 (02): : 242 - 248
  • [23] Missing data imputation by utilizing information within incomplete instances
    Zhang, Shichao
    Jin, Zhi
    Zhu, Xiaofeng
    JOURNAL OF SYSTEMS AND SOFTWARE, 2011, 84 (03) : 452 - 459
  • [24] Approach to Imputation Multivariate Missing Data of Urban Buildings by Chained Equations Based on Geospatial Information
    Khrulkov, Alexander A.
    Mishina, Margarita E.
    Mityagin, Sergey A.
    COMPUTATIONAL SCIENCE - ICCS 2022, PT III, 2022, 13352 : 234 - 247
  • [25] Optimal imputation of the missing data using multi auxiliary information
    Shashi Bhushan
    Abhay Pratap Pandey
    Computational Statistics, 2021, 36 : 449 - 477
  • [26] Optimal imputation of the missing data using multi auxiliary information
    Bhushan, Shashi
    Pandey, Abhay Pratap
    COMPUTATIONAL STATISTICS, 2021, 36 (01) : 449 - 477
  • [27] A New Fuzzy Adaptive Algorithm to Classify Imbalanced Data
    Patel, Harshita
    Rajput, Dharmendra Singh
    Stan, Ovidiu Petru
    Miclea, Liviu Cristian
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (01): : 73 - 89
  • [28] Adaptive Deep Incremental Learning - Assisted Missing Data Imputation for Streaming Data
    Syavasya, C. V. S. R.
    Lakshmi, M. A.
    JOURNAL OF INTERCONNECTION NETWORKS, 2022, 22 (SUPP02)
  • [29] An Exploration of Online Missing Value Imputation in Non-stationary Data Stream
    Dong W.
    Gao S.
    Yang X.
    Yu H.
    SN Computer Science, 2021, 2 (2)
  • [30] A new iterative fuzzy clustering algorithm for multiple imputation of missing data
    Nikfalazar, Sanaz
    Yeh, Chung-Hsing
    Bedingfield, Susan
    Khorshidi, Hadi A.
    2017 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2017,