Zero-Shot Object Detection with Textual Descriptions Using Convolutional Neural Networks

被引：4

作者：

Zhang, Licheng ^{[1
,2
]}

Wang, Xianzhi ^{[2
]}

Yao, Lina ^{[3
]}

Zheng, Feng ^{[1
]}

机构：

[1] Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen, Guangdong, Peoples R China

[2] Univ Technol Sydney, Sch Comp Sci, Sydney, NSW, Australia

[3] Univ New South Wales, Sch Comp Sci & Engn, Sydney, NSW, Australia

来源：

2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2020年

基金：

中国国家自然科学基金;

关键词：

zero-shot object detection; textual description; word vector representation; convolutional neural network; online hard example mining;

D O I：

10.1109/ijcnn48605.2020.9207417

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Zero-shot object detection aims to detect and recognize objects unobserved in training samples from images. Previous studies generally utilized concept names or textual descriptions to build relationships between seen and unseen classes. However, these works rarely exploited the valuable information in textual descriptions for optimizing the network. Actually, textual descriptions contain much valuable information related to categories. Exploiting this information can help training the network and improve the detection performance. Besides, textual descriptions usually contain the names of objects that need to be detected. By using this character, we can narrow the scope of candidate unseen categories, thus can improve the detection accuracy. In this regard, we propose a novel framework that incorporates both images and their text descriptions for zero-shot object detection. In particular, we employ text convolutional neural network (CNN) and Faster R-CNN to extract text features and image features respectively, and combine them to optimize the regions that contain objects in images and to classify those newly detected objects simultaneously. Besides, we try extracting potential object labels directly from textual descriptions and introducing online hard example mining (OHEM) to assist with object classification and network optimization. Our extensive experiments on two public datasets demonstrate the superior performance of our approach to state-of-the-art methods.

引用

页数：6

共 50 条

[1] Predicting Deep Zero-Shot Convolutional Neural Networks using Textual Descriptions
Ba, Jimmy Lei
Swersky, Kevin
Fidler, Sanja
Salakhutdinov, Ruslan
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4247 - 4255
[2] Zero-Shot Object Detection with Textual Descriptions
Li, Zhihui
Yao, Lina
Zhang, Xiaoqin
Wang, Xianzhi
Kanhere, Salil
Zhang, Huaxiang
[J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8690 - 8697
[3] Write a Classifier: Zero-Shot Learning Using Purely Textual Descriptions
Elhoseiny, Mohamed
Saleh, Babak
Elgammal, Ahmed
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 2584 - 2591
[4] Zero-Shot Object Detection
Bansal, Ankan
Sikka, Karan
Sharma, Gaurav
Chellappa, Rama
Divakaran, Ajay
[J]. COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 397 - 414
[5] ZERO-SHOT OBJECT DETECTION WITH TRANSFORMERS
Zheng, Ye
Cui, Li
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 444 - 448
[6] Zero-Shot Camouflaged Object Detection
Li, Haoran
Feng, Chun-Mei
Xu, Yong
Zhou, Tao
Yao, Lina
Chang, Xiaojun
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 5126 - 5137
[7] Shot Category Detection based on Object Detection Using Convolutional Neural Networks
Jung, Deokkyu
Son, Jeong-Woo
Kim, Sun-Joong
[J]. 2018 20TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT), 2018, : 36 - 39
[8] Zero-shot Learning Using Multimodal Descriptions
Mall, Utkarsh
Hariharan, Bharath
Bala, Kavita
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 3930 - 3938
[9] Zero-Shot Object Detection for Indoor Robots
Abdalwhab, Abdalwhab
Liu, Huaping
[J]. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
[10] Transductive Learning for Zero-Shot Object Detection
Rahman, Shafin
Khan, Salman
Barnes, Nick
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6081 - 6090

← 1 2 3 4 5 →