Zero-Shot Object Detection with Textual Descriptions Using Convolutional Neural Networks

被引:4
|
作者
Zhang, Licheng [1 ,2 ]
Wang, Xianzhi [2 ]
Yao, Lina [3 ]
Zheng, Feng [1 ]
机构
[1] Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen, Guangdong, Peoples R China
[2] Univ Technol Sydney, Sch Comp Sci, Sydney, NSW, Australia
[3] Univ New South Wales, Sch Comp Sci & Engn, Sydney, NSW, Australia
基金
中国国家自然科学基金;
关键词
zero-shot object detection; textual description; word vector representation; convolutional neural network; online hard example mining;
D O I
10.1109/ijcnn48605.2020.9207417
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Zero-shot object detection aims to detect and recognize objects unobserved in training samples from images. Previous studies generally utilized concept names or textual descriptions to build relationships between seen and unseen classes. However, these works rarely exploited the valuable information in textual descriptions for optimizing the network. Actually, textual descriptions contain much valuable information related to categories. Exploiting this information can help training the network and improve the detection performance. Besides, textual descriptions usually contain the names of objects that need to be detected. By using this character, we can narrow the scope of candidate unseen categories, thus can improve the detection accuracy. In this regard, we propose a novel framework that incorporates both images and their text descriptions for zero-shot object detection. In particular, we employ text convolutional neural network (CNN) and Faster R-CNN to extract text features and image features respectively, and combine them to optimize the regions that contain objects in images and to classify those newly detected objects simultaneously. Besides, we try extracting potential object labels directly from textual descriptions and introducing online hard example mining (OHEM) to assist with object classification and network optimization. Our extensive experiments on two public datasets demonstrate the superior performance of our approach to state-of-the-art methods.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Predicting Deep Zero-Shot Convolutional Neural Networks using Textual Descriptions
    Ba, Jimmy Lei
    Swersky, Kevin
    Fidler, Sanja
    Salakhutdinov, Ruslan
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4247 - 4255
  • [2] Zero-Shot Object Detection with Textual Descriptions
    Li, Zhihui
    Yao, Lina
    Zhang, Xiaoqin
    Wang, Xianzhi
    Kanhere, Salil
    Zhang, Huaxiang
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8690 - 8697
  • [3] Write a Classifier: Zero-Shot Learning Using Purely Textual Descriptions
    Elhoseiny, Mohamed
    Saleh, Babak
    Elgammal, Ahmed
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 2584 - 2591
  • [4] Zero-Shot Object Detection
    Bansal, Ankan
    Sikka, Karan
    Sharma, Gaurav
    Chellappa, Rama
    Divakaran, Ajay
    [J]. COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 397 - 414
  • [5] ZERO-SHOT OBJECT DETECTION WITH TRANSFORMERS
    Zheng, Ye
    Cui, Li
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 444 - 448
  • [6] Zero-Shot Camouflaged Object Detection
    Li, Haoran
    Feng, Chun-Mei
    Xu, Yong
    Zhou, Tao
    Yao, Lina
    Chang, Xiaojun
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 5126 - 5137
  • [7] Shot Category Detection based on Object Detection Using Convolutional Neural Networks
    Jung, Deokkyu
    Son, Jeong-Woo
    Kim, Sun-Joong
    [J]. 2018 20TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT), 2018, : 36 - 39
  • [8] Zero-shot Learning Using Multimodal Descriptions
    Mall, Utkarsh
    Hariharan, Bharath
    Bala, Kavita
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 3930 - 3938
  • [9] Zero-Shot Object Detection for Indoor Robots
    Abdalwhab, Abdalwhab
    Liu, Huaping
    [J]. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [10] Transductive Learning for Zero-Shot Object Detection
    Rahman, Shafin
    Khan, Salman
    Barnes, Nick
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6081 - 6090