An Information-Theoretic Analysis of the Impact of Task Similarity on Meta-Learning

被引:10
|
作者
Jose, Sharu Theresa [1 ]
Simeone, Osvaldo [1 ]
机构
[1] Kings Coll London, Dept Engn, Kings Commun Learning & Informat Proc KCLIP Lab, London, England
基金
欧洲研究理事会;
关键词
D O I
10.1109/ISIT45174.2021.9517767
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Meta-learning aims at optimizing the hyperparameters of a model class or training algorithm from the observation of data from a number of related tasks. Following the setting of Baxter [1], the tasks are assumed to belong to the same task environment, which is defined by a distribution over the space of tasks and by per-task data distributions. The statistical properties of the task environment thus dictate the similarity of the tasks. The goal of the meta-learner is to ensure that the hyperparameters obtain a small loss when applied for training of a new task sampled from the task environment. The difference between the resulting average loss, known as meta-population loss, and the corresponding empirical lass measured on the available data from related tasks, known as meta-generalization gap, is a measure of the generalization capability of the meta-learner. In this paper, we present novel information-theoretic bounds on the average absolute value of the meta-generalization gap. Unlike prior work [2], our bounds explicitly capture the impact of task relatedness, the number of tasks, and the number of data samples per task on the meta-generalization gap. Task similarity is gauged via the Kullback-Leibler (KL) and Jensen-Shannon (JS) divergences. We illustrate the proposed bounds on the example of ridge regression with meta-learned bias.
引用
收藏
页码:1534 / 1539
页数:6
相关论文
共 50 条
  • [41] Information-theoretic analysis of neural coding
    Johnson, DH
    Gruner, CM
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 1937 - 1940
  • [42] Information-Theoretic Analysis of Neural Coding
    Don H. Johnson
    Charlotte M. Gruner
    Keith Baggerly
    Chandran Seshagiri
    Journal of Computational Neuroscience, 2001, 10 : 47 - 69
  • [43] Similarity interaction in information-theoretic self-organizing maps
    Kamimura, Ryotaro
    INTERNATIONAL JOURNAL OF GENERAL SYSTEMS, 2013, 42 (03) : 239 - 267
  • [44] Information-Theoretic Considerations in Batch Reinforcement Learning
    Chen, Jinglin
    Jiang, Nan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [45] Identifying Cover Songs Using Information-Theoretic Measures of Similarity
    Foster, Peter
    Dixon, Simon
    Klapuri, Anssi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (06) : 993 - 1005
  • [46] Student survey by information-theoretic competitive learning
    Kamimura, Ryotaro
    2006 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-6, PROCEEDINGS, 2006, : 5135 - 5140
  • [47] Feature extraction using information-theoretic learning
    Hild, Kenneth E., II
    Erdogmus, Deniz
    Torkkola, Kari
    Principe, Jose C.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2006, 28 (09) : 1385 - 1392
  • [48] Information-Theoretic Analysis of Haplotype Assembly
    Si, Hongbo
    Vikalo, Haris
    Vishwanath, Sriram
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2017, 63 (06) : 3468 - 3479
  • [49] Information-Theoretic Analysis of Spherical Fingerprinting
    Moulin, Pierre
    Wang, Ying
    2009 INFORMATION THEORY AND APPLICATIONS WORKSHOP, 2009, : 226 - +
  • [50] Information-theoretic analysis of neural coding
    Johnson, DH
    Gruner, CM
    Baggerly, K
    Seshagiri, C
    JOURNAL OF COMPUTATIONAL NEUROSCIENCE, 2001, 10 (01) : 47 - 69