Missing information in imbalanced data stream: fuzzy adaptive imputation approach

被引:0
|
作者
Bohnishikha Halder
Md Manjur Ahmed
Toshiyuki Amagasa
Nor Ashidi Mat Isa
Rahat Hossain Faisal
Md. Mostafijur Rahman
机构
[1] University of Barishal,Department of Computer Science and Engineering
[2] University of Tsukuba,Center for Computational Sciences
[3] Universiti Sains Malaysia,School of Electrical and Electronic Engineering, Engineering Campus
[4] Daffodil International University,Department of Software Engineering
来源
Applied Intelligence | 2022年 / 52卷
关键词
Data imputation; Missing information; Fuzzy adaptive approach; Pattern recognition; Imbalanced data; Data stream;
D O I
暂无
中图分类号
学科分类号
摘要
From a real-world perspective, missing information is an ordinary scenario in data stream. Generally, missing data generate diverse problems in recognizing the pattern of data (i.e., clustering and classification). Particularly, missing data in data stream is a challenging topic. With imbalanced data, the problem of missing data greatly affects pattern recognition. As a solution to all these issues, this study puts forward an adaptive technique with fuzzy-based information decomposition method, which simultaneously solves the problem of incomplete data and overcomes the imbalanced data stream in a dataset. The main purpose of the proposed fuzzy adaptive imputation approach (FAIA) is to represent the effect of missing values in imbalance data stream and handle the missing data problem in imbalance data stream. FAIA is a single pass method. It considers adaptive selection of intervals based on all observed instances by using the interrelationship of attributes to identify correct interval for computing missing instances. Here, the interrelationship of two attributes means one attribute’s value of an instance depends on another attribute’s value of the same instance. In FAIA, after measuring all interval distances from a certain missing value, the least distance is selected for this missing value. Synthetic data of minority class are generated using the same process of missing value imputation for balancing data that is called oversampling. Instances of the datasets are divided into the chunks in data stream to balance data without any ensemble of previous chunks because missing values may misguide the future chunks. To demonstrate the performance of FAIA, the experiment is divided into three parts: missing data imputation, imbalanced information for offline data for data stream, and imbalanced information with missing value for offline data. Eleven numerical datasets with different dimensions from various repositories are considered for the computing performance of missing data imputation and imbalanced data without data stream. Four different datasets are also used to measure the performance of imbalanced data stream. In maximum measuring cases, the proposed method outperforms.
引用
收藏
页码:5561 / 5583
页数:22
相关论文
共 50 条
  • [41] A Graph-Based Approach for Missing Sensor Data Imputation
    Jiang, Xiao
    Tian, Zean
    Li, Kenli
    IEEE SENSORS JOURNAL, 2021, 21 (20) : 23133 - 23144
  • [42] Categorical missing data imputation approach via sparse representation
    Shao, Xiaochen
    Wu, Sen
    Feng, Xiaodong
    Song, Rui
    INTERNATIONAL JOURNAL OF SERVICES TECHNOLOGY AND MANAGEMENT, 2016, 22 (3-5) : 256 - 270
  • [43] Missing Data: data replacement and imputation
    Hutcheson, Graeme
    Pampaka, Maria
    JOURNAL OF MODELLING IN MANAGEMENT, 2012, 7 (02)
  • [44] Missing Data Imputation: A Survey
    Kelkar, Bhagyashri Abhay
    INTERNATIONAL JOURNAL OF DECISION SUPPORT SYSTEM TECHNOLOGY, 2022, 14 (01)
  • [45] Missing Data and Imputation Methods
    Schober, Patrick
    Vetter, Thomas R.
    ANESTHESIA AND ANALGESIA, 2020, 131 (05): : 1419 - 1420
  • [46] Missing Data and Multiple Imputation
    Cummings, Peter
    JAMA PEDIATRICS, 2013, 167 (07) : 656 - 661
  • [48] An Improved Fuzzy K-Nearest Neighbor Algorithm for Imbalanced Data using Adaptive Approach
    Patel, Harshita
    Thakur, G. S.
    IETE JOURNAL OF RESEARCH, 2019, 65 (06) : 780 - 789
  • [49] Missing data, imputation, and endogeneity
    McDonough, Ian K.
    Millimet, Daniel L.
    JOURNAL OF ECONOMETRICS, 2017, 199 (02) : 141 - 155
  • [50] Imputation of Missing Healthcare Data
    Chowdhury, Mohaimanul Hoque
    Islam, Muhammad Kamrul
    Khan, Shahidul Islam
    2017 20TH INTERNATIONAL CONFERENCE OF COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2017,