Tutor-based learning of visual categories using different levels of supervision

被引:2
|
作者
Fritz, Mario [1 ,2 ]
Kruijff, Geert-Jan M. [3 ]
Schiele, Bernt [4 ,5 ]
机构
[1] Univ Calif Berkeley, Dept EECS, Berkeley, CA 94720 USA
[2] ICSI, Berkeley, CA USA
[3] DFKI GmbH, Language Technol Lab, Saarbrucken, Germany
[4] Tech Univ Darmstadt, CS Dept, Saarbrucken, Germany
[5] MPI Informat, Saarbrucken, Germany
关键词
Object categorization; Cross-modal learning; Tutor-based learning; Incremental learning; Interactive learning; Unsupervised learning; Semi-supervised learning;
D O I
10.1016/j.cviu.2009.12.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years we have seen lots of strong work in visual recognition, dialogue interpretation and multi-modal learning that is targeted at provide the building blocks to enable intelligent robots to interact with humans in a meaningful way and even continuously evolve during this process. Building systems that unify those components under a common architecture has turned out to be challenging, as each approach comes with it's own set of assumptions, restrictions, and implications. For example, the impact of recent progress on visual category recognition has been limited from a perspective of interactive systems. Reasons for this are diverse. We identify and address two major challenges in order to integrate modern techniques for visual categorization in an interactive learning system: reducing the number of required labelled training examples and dealing with potentially erroneous input. Today's object categorization methods use either supervised or unsupervised training methods. While supervised methods tend to produce more accurate results, unsupervised methods are highly attractive due to their potential to use far more and unlabeled training data. We proposes a novel method that uses unsupervised training to obtain visual groupings of objects and a cross-modal learning scheme to overcome inherent limitations of purely unsupervised training. The method uses a unified and scale-invariant object representation that allows to handle labeled as well as unlabeled information in a coherent way. First experiments demonstrate the ability of the system to learn object category models from many unlabeled observations and a few dialogue interactions that can be ambiguous or even erroneous. (c) 2010 Elsevier Inc. All rights reserved.
引用
收藏
页码:564 / 573
页数:10
相关论文
共 50 条