Interpretable SAM-kNN Regressor for Incremental Learning on High-Dimensional Data Streams

被引:0
|
作者
Jakob, Jonathan [1 ]
Artelt, Andre [1 ,2 ]
Hasenjaeger, Martina [3 ]
Hammer, Barbara [1 ]
机构
[1] Bielefeld Univ, Tech Fac, Bielefeld, Germany
[2] Univ Cyprus, Dept Comp Sci, Nicosia, Cyprus
[3] Honda Res Inst, Learning & Personalizat Grp, Offenbach, Germany
基金
欧洲研究理事会;
关键词
CONCEPT DRIFT; MODEL;
D O I
10.1080/08839514.2023.2198846
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In many real-world scenarios, data are provided as a potentially infinite stream of samples that are subject to changes in the underlying data distribution, a phenomenon often referred to as concept drift. A specific facet of concept drift is feature drift, where the relevance of a feature to the problem at hand changes over time. High-dimensionality of the data poses an additional challenge to learning algorithms operating in such environments. Common scenarios of this nature can for example be found in sensor-based maintenance operations of industrial machines or inside entire networks, such as power grids or water distribution systems. However, since most existing methods for incremental learning focus on classification tasks, efficient online learning for regression is still an underdeveloped area. In this work, we introduce an extension to the SAM-kNN Regressor that incorporates metric learning in order to improve the prediction quality on data streams, gain insights into the relevance of different input features and based on that, transform the input data into a lower dimension in order to improve computational complexity and suitability for high-dimensional data. We evaluate our proposed method on artificial data, to demonstrate its applicability in various scenarios. In addition to that, we apply the method to the real-world problem of water distribution network monitoring. Specifically, we demonstrate that sensor faults in the water distribution network can be detected by monitoring the feature relevances computed by our algorithm.
引用
收藏
页数:30
相关论文
共 50 条
  • [1] SAM-kNN Regressor for Online Learning in Water Distribution Networks
    Jakob, Jonathan
    Artelt, Andre
    Hasenjaeger, Martina
    Hammer, Barbara
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT III, 2022, 13531 : 752 - 762
  • [2] Balanced SAM-kNN: Online Learning with Heterogeneous Drift and Imbalanced Data
    Vaquet, Valerie
    Hammer, Barbara
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II, 2020, 12397 : 850 - 862
  • [3] High-dimensional kNN joins with incremental updates
    Cui Yu
    Rui Zhang
    Yaochun Huang
    Hui Xiong
    [J]. GeoInformatica, 2010, 14 : 55 - 82
  • [4] High-dimensional kNN joins with incremental updates
    Yu, Cui
    Zhang, Rui
    Huang, Yaochun
    Xiong, Hui
    [J]. GEOINFORMATICA, 2010, 14 (01) : 55 - 82
  • [5] Interpretable Approximation of High-Dimensional Data
    Potts, Daniel
    Schmischke, Michael
    [J]. SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2021, 3 (04): : 1301 - 1323
  • [6] Learning High-Dimensional Evolving Data Streams With Limited Labels
    Din, Salah Ud
    Kumar, Jay
    Shao, Junming
    Mawuli, Cobbinah Bernard
    Ndiaye, Waldiodio David
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (11) : 11373 - 11384
  • [7] Bayesian evolutionary hypernetworks for interpretable learning from high-dimensional data
    Kim, Soo-Jin
    Ha, Jung-Woo
    Kim, Heebal
    Zhang, Byoung-Tak
    [J]. APPLIED SOFT COMPUTING, 2019, 81
  • [8] Boosting for Vote Learning in High-dimensional kNN Classification
    Tomasev, Nenad
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2014, : 676 - 683
  • [9] Scalable and Interpretable Data Representation for High-Dimensional, Complex Data
    Kim, Been
    Patel, Kayur
    Rostamizadeh, Afshin
    Shah, Julie
    [J]. PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 1763 - 1769
  • [10] Learning high-dimensional data
    Verleysen, M
    [J]. LIMITATIONS AND FUTURE TRENDS IN NEURAL COMPUTATION, 2003, 186 : 141 - 162