[2] Queensland Univ Technol QUT, Ctr Robot, Brisbane, Qld, Australia
来源:
ADVANCES IN ARTIFICIAL INTELLIGENCE, KI 2021
|
2021年
/
12873卷
关键词:
Imitation learning;
Reinforcement learning;
Image segmentation;
Reward-centric objects;
First person point of view;
MineRL;
Minecraft;
D O I:
10.1007/978-3-030-87626-5_25
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
This work discusses a learning approach to mask rewarding objects in images using sparse reward signals from an imitation learning dataset. For that we train an Hourglass network using only feedback from a critic model. The Hourglass network learns to produce a mask to decrease the critic's score of a high score image and increase the critic's score of a low score image by swapping the masked areas between these two images. We trained the model on an imitation learning dataset from the NeurIPS 2020 MineRL Competition Track, where our model learned to mask rewarding objects in a complex interactive 3D environment with a sparse reward signal. This approach was part of the 1st place winning solution in this competition. Video demonstration and code: https://rebrand.ly/critic-guided-segmentation.