Information-Theoretic Data Discarding for Dynamic Trees on Data Streams

被引:8
|
作者
Anagnostopoulos, Christoforos [1 ]
Gramacy, Robert B. [2 ]
机构
[1] Univ London Imperial Coll Sci Technol & Med, Dept Math, London SW7 2AZ, England
[2] Univ Chicago, Booth Sch Business, Chicago, IL 60637 USA
关键词
regression and classification trees; dynamic trees; streaming data; massive data; online learning; active learning; REGRESSION;
D O I
10.3390/e15125510
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Ubiquitous automated data collection at an unprecedented scale is making available streaming, real-time information flows in a wide variety of settings, transforming both science and industry. Learning algorithms deployed in such contexts often rely on single-pass inference, where the data history is never revisited. Learning may also need to be temporally adaptive to remain up-to-date against unforeseen changes in the data generating mechanism. Online Bayesian inference remains challenged by such transient, evolving data streams. Nonparametric modeling techniques can prove particularly ill-suited, as the complexity of the model is allowed to increase with the sample size. In this work, we take steps to overcome these challenges by porting information theoretic heuristics, such as exponential forgetting and active learning, into a fully Bayesian framework. We showcase our methods by augmenting a modern non-parametric modeling framework, dynamic trees, and illustrate its performance on a number of practical examples. The end product is a powerful streaming regression and classification tool, whose performance compares favorably to the state-of-the-art.
引用
收藏
页码:5510 / 5535
页数:26
相关论文
共 50 条
  • [1] Information-theoretic fuzzy approach to data reliability and data mining
    Maimon, O
    Kandel, A
    Last, M
    [J]. FUZZY SETS AND SYSTEMS, 2001, 117 (02) : 183 - 194
  • [2] Information-Theoretic Foundations of DNA Data Storage
    Shomorony, Ilan
    Heckel, Reinhard
    [J]. FOUNDATIONS AND TRENDS IN COMMUNICATIONS AND INFORMATION THEORY, 2022, 19 (01): : 1 - 106
  • [3] An information-theoretic approach to hierarchical clustering of uncertain data
    Gullo, Francesco
    Ponti, Giovanni
    Tagarelli, Andrea
    Greco, Sergio
    [J]. INFORMATION SCIENCES, 2017, 402 : 199 - 215
  • [4] Information-Theoretic Joint Probabilistic Data Association Filter
    He, Shaoming
    Shin, Hyo-Sang
    Tsourdos, Antonios
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (03) : 1262 - 1269
  • [5] An information-theoretic framework for process structure and data mining
    Chiaravalloti, Antonio D.
    Greco, Gianluigi
    Guzzo, Antonella
    Pontieri, Luigi
    [J]. DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4081 : 248 - 259
  • [6] An information-theoretic model for image watermarking and data hiding
    Moulin, P
    Mihçak, MK
    Lin, GI
    [J]. 2000 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL III, PROCEEDINGS, 2000, : 667 - 670
  • [7] Entropology: an Information-Theoretic Approach to Understanding Archaeological Data
    Gheorghiade, Paula
    Vasiliauskaite, Vaiva
    Diachenko, Aleksandr
    Price, Henry
    Evans, Tim
    Rivers, Ray
    [J]. JOURNAL OF ARCHAEOLOGICAL METHOD AND THEORY, 2023, 30 (4) : 1109 - 1141
  • [8] An Information-Theoretic Outlier Detection Method for Prescription Data
    Zhang, Hongxiang
    Wang, Lizhen
    [J]. PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 2361 - 2365
  • [9] An Information-theoretic approach to dimensionality reduction in data science
    Sambriddhi Mainali
    Max Garzon
    Deepak Venugopal
    Kalidas Jana
    Ching-Chi Yang
    Nirman Kumar
    Dale Bowman
    Lih-Yuan Deng
    [J]. International Journal of Data Science and Analytics, 2021, 12 : 185 - 203
  • [10] Information-theoretic feature selection for functional data classification
    Gomez-Verdejo, Vanessa
    Verleysen, Michel
    Fleury, Jerome
    [J]. NEUROCOMPUTING, 2009, 72 (16-18) : 3580 - 3589