Information-Theoretic Data Discarding for Dynamic Trees on Data Streams

被引:8
|
作者
Anagnostopoulos, Christoforos [1 ]
Gramacy, Robert B. [2 ]
机构
[1] Univ London Imperial Coll Sci Technol & Med, Dept Math, London SW7 2AZ, England
[2] Univ Chicago, Booth Sch Business, Chicago, IL 60637 USA
关键词
regression and classification trees; dynamic trees; streaming data; massive data; online learning; active learning; REGRESSION;
D O I
10.3390/e15125510
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Ubiquitous automated data collection at an unprecedented scale is making available streaming, real-time information flows in a wide variety of settings, transforming both science and industry. Learning algorithms deployed in such contexts often rely on single-pass inference, where the data history is never revisited. Learning may also need to be temporally adaptive to remain up-to-date against unforeseen changes in the data generating mechanism. Online Bayesian inference remains challenged by such transient, evolving data streams. Nonparametric modeling techniques can prove particularly ill-suited, as the complexity of the model is allowed to increase with the sample size. In this work, we take steps to overcome these challenges by porting information theoretic heuristics, such as exponential forgetting and active learning, into a fully Bayesian framework. We showcase our methods by augmenting a modern non-parametric modeling framework, dynamic trees, and illustrate its performance on a number of practical examples. The end product is a powerful streaming regression and classification tool, whose performance compares favorably to the state-of-the-art.
引用
收藏
页码:5510 / 5535
页数:26
相关论文
共 50 条
  • [41] INFORMATION-THEORETIC DATA-HIDING: RECENT ACHIEVEMENTS AND OPEN PROBLEMS
    Voloshynovskiy, Sviatoslav
    Deguillaumet, Frederic
    Koval, Oleksiy
    Pun, Thierry
    [J]. INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2005, 5 (01) : 5 - 35
  • [42] A Hierarchical Algorithm for Clustering Uncertain Data via an Information-Theoretic Approach
    Gullo, Francesco
    Ponti, Giovanni
    Tagarelli, Andrea
    Greco, Sergio
    [J]. ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 821 - 826
  • [43] Measuring structural similarity of semistructured data based on information-theoretic approaches
    Sven Helmer
    Nikolaus Augsten
    Michael Böhlen
    [J]. The VLDB Journal, 2012, 21 : 677 - 702
  • [44] The Information-Theoretic Value of Unlabeled Data in Semi-Supervised Learning
    Golovnev, Alexander
    Pal, David
    Szorenyi, Balazs
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [45] Sparse Data Injection Attacks on Smart Grid: An Information-Theoretic Approach
    Sun, Ke
    Li, Zhenglin
    [J]. IEEE SENSORS JOURNAL, 2022, 22 (14) : 14553 - 14562
  • [46] Dynamic Information-Theoretic Measures for Security Informatics
    Colbaugh, Richard
    Glass, Kristin
    Bauer, Travis
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS: BIG DATA, EMERGENT THREATS, AND DECISION-MAKING IN SECURITY INFORMATICS, 2013, : 45 - 49
  • [47] How consistent is my model with the data? Information-Theoretic Model Check
    Svensson, Andreas
    Zachariah, Dave
    Schon, Thomas B.
    [J]. IFAC PAPERSONLINE, 2018, 51 (15): : 407 - 412
  • [48] Information-Theoretic Feature Selection in Microarray Data Using Variable Complementarity
    Meyer, Patrick Emmanuel
    Schretter, Colas
    Bontempi, Gianluca
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2008, 2 (03) : 261 - 274
  • [49] Measuring structural similarity of semistructured data based on information-theoretic approaches
    Helmer, Sven
    Augsten, Nikolaus
    Boehlen, Michael
    [J]. VLDB JOURNAL, 2012, 21 (05): : 677 - 702
  • [50] Interactive visual data exploration with subjective feedback: an information-theoretic approach
    Puolamaki, Kai
    Oikarinen, Emilia
    Kang, Bo
    Lijffijt, Jefrey
    De Bie, Tijl
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2020, 34 (01) : 21 - 49