Described Object Detection: Liberating Object Detection with Flexible Expressions

被引:0
|
作者
Xie, Chi [1 ]
Zhang, Zhao [2 ]
Wu, Yixuan [3 ]
Zhu, Feng [2 ]
Zhao, Rui [2 ]
Liang, Shuang [1 ]
机构
[1] Tongji Univ, Shanghai, Peoples R China
[2] Sensetime Res, Hong Kong, Peoples R China
[3] Zhejiang Univ, Hangzhou, Peoples R China
基金
上海市自然科学基金; 中国国家自然科学基金;
关键词
LANGUAGE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Detecting objects based on language information is a popular task that includes Open-Vocabulary object Detection (OVD) and Referring Expression Comprehension (REC). In this paper, we advance them to a more practical setting called Described Object Detection (DOD) by expanding category names to flexible language expressions for OVD and overcoming the limitation of REC only grounding the pre-existing object. We establish the research foundation for DOD by constructing a Description Detection Dataset (D3). This dataset features flexible language expressions, whether short category names or long descriptions, and annotating all described objects on all images without omission. By evaluating previous SOTA methods on D3, we find some troublemakers that fail current REC, OVD, and bi-functional methods. REC methods struggle with confidence scores, rejecting negative instances, and multi-target scenarios, while OVD methods face constraints with long and complex descriptions. Recent bi-functional methods also do not work well on DOD due to their separated training procedures and inference strategies for REC and OVD tasks. Building upon the aforementioned findings, we propose a baseline that largely improves REC methods by reconstructing the training data and introducing a binary classification sub-task, outperforming existing methods. Data and code are available at this URL and related works are tracked in this repo.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] OBJECT DETECTION AND RECOGNITION
    Arora, Shakti
    Saini, Harish
    Kumar, Ashish
    ADVANCES AND APPLICATIONS IN MATHEMATICAL SCIENCES, 2019, 18 (08): : 741 - 751
  • [32] SUSPICIOUS OBJECT DETECTION
    Joshi, Tarushikha
    Aarya, Harsh
    Kumar, Prashant
    2016 2ND INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION, & AUTOMATION (ICACCA) (FALL), 2016, : 146 - 151
  • [33] Key Object Detection: Unifying Salient and Camouflaged Object Detection Into One Task
    Yin, Pengyu
    Fu, Keren
    Zhao, Qijun
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT XII, 2025, 15042 : 536 - 550
  • [34] ATTENTIVE LAYER SEPARATION FOR OBJECT CLASSIFICATION AND OBJECT LOCALIZATION IN OBJECT DETECTION
    Kim, Jung Uk
    Ro, Yong Man
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3995 - 3999
  • [35] Pseudo Object Replay and Mining for Incremental Object Detection
    Yang, Dongbao
    Zhou, Yu
    Hong, Xiaopeng
    Zhang, Aoting
    Wei, Xin
    Zeng, Linchengxi
    Qiao, Zhi
    Wang, Weiping
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 153 - 162
  • [36] Metamorphic Object Insertion for Testing Object Detection Systems
    Wang, Shuai
    Su, Zhendong
    2020 35TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE 2020), 2020, : 1053 - 1065
  • [37] EFFECT OF OBJECT MOVEMENT ON INFANTS DETECTION OF OBJECT STRUCTURE
    RUFF, HA
    DEVELOPMENTAL PSYCHOLOGY, 1982, 18 (03) : 462 - 472
  • [38] Ensemble of Deep Object Detectors for Page Object Detection
    Vo, Nguyen D.
    Khanh Nguyen
    Nguyen, Tam, V
    Khang Nguyen
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON UBIQUITOUS INFORMATION MANAGEMENT AND COMMUNICATION (IMCOM 2018), 2018,
  • [39] Multicamera Object Detection and Tracking with Object Size Estimation
    Evans, Murray
    Osborne, Christopher J.
    Ferryman, James
    2013 10TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS 2013), 2013, : 177 - 182
  • [40] Object Detection using Object Likelihood and Homogeneity Likelihood
    Zhang, Shu
    Xie, Mei
    2012 5TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING (CISP), 2012, : 906 - 910