Efficient multi-attribute image classification through context-driven networks

被引:0
|
作者
Banger, Sean [1 ]
Ceresani, Ryan [1 ]
Twedt, Jason [1 ]
机构
[1] Lockheed Martin AI Ctr, King Of Prussia, PA 19406 USA
来源
关键词
image classification; computer vision; deep learning; neural networks; attention; transformers; multitask; learning; visual question answering;
D O I
10.1117/12.2618977
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Performing many simultaneous tasks on a resource-limited device is challenging due to the limited amount of available computational resources. Efficient and universal model architectures are the key to solving this problem. Existing sub-fields of machine learning, such as Multi-Task Learning (MTL), have proven that learning multiple tasks with a single neural network architecture is possible and even has the potential to improve sample efficiency, memory efficiency, and can be less prone to overfitting. In Visual Question Answering (VQA), a model ingests multi-modal input to produce text-based responses in the context of an image. Our proposed architecture merges the MTL and VQA concepts to form TaskNet. TaskNet solves the visual MTL problem using an input task to provide context to the network and guide its attention mechanism towards providing a relevant response. Our approach saves memory without sacrificing performance relative to naively training independent models. TaskNet efficiently provides multiple fine-grained classifications on a single input image and seamlessly incorporates context-specific metadata to further boost performance in a world of high variance.
引用
收藏
页数:8
相关论文
共 50 条