Described Object Detection: Liberating Object Detection with Flexible Expressions

被引:0
|
作者
Xie, Chi [1 ]
Zhang, Zhao [2 ]
Wu, Yixuan [3 ]
Zhu, Feng [2 ]
Zhao, Rui [2 ]
Liang, Shuang [1 ]
机构
[1] Tongji Univ, Shanghai, Peoples R China
[2] Sensetime Res, Hong Kong, Peoples R China
[3] Zhejiang Univ, Hangzhou, Peoples R China
基金
上海市自然科学基金; 中国国家自然科学基金;
关键词
LANGUAGE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Detecting objects based on language information is a popular task that includes Open-Vocabulary object Detection (OVD) and Referring Expression Comprehension (REC). In this paper, we advance them to a more practical setting called Described Object Detection (DOD) by expanding category names to flexible language expressions for OVD and overcoming the limitation of REC only grounding the pre-existing object. We establish the research foundation for DOD by constructing a Description Detection Dataset (D3). This dataset features flexible language expressions, whether short category names or long descriptions, and annotating all described objects on all images without omission. By evaluating previous SOTA methods on D3, we find some troublemakers that fail current REC, OVD, and bi-functional methods. REC methods struggle with confidence scores, rejecting negative instances, and multi-target scenarios, while OVD methods face constraints with long and complex descriptions. Recent bi-functional methods also do not work well on DOD due to their separated training procedures and inference strategies for REC and OVD tasks. Building upon the aforementioned findings, we propose a baseline that largely improves REC methods by reconstructing the training data and introducing a binary classification sub-task, outperforming existing methods. Data and code are available at this URL and related works are tracked in this repo.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Flexible Edge Arrangement Templates for object detection
    Li, Yan
    Tsin, Yanghai
    Genc, Yakup
    Kanade, Takeo
    2008 IEEE WORKSHOP ON APPLICATIONS OF COMPUTER VISION, 2008, : 67 - +
  • [2] Object Detection Via Flexible Anchor Generation
    Ding, Pengxin
    Zhou, Huan
    Shang, Jinxia
    Zou, Xiang
    Wang, Minghui
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (15)
  • [3] A Flexible Fall Detection Framework Based on Object Detection and Motion Analysis
    Ros, Dara
    Dai, Rui
    2023 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION, ICAIIC, 2023, : 63 - 68
  • [4] Performance Evaluation of Object Detection Techniques for Object Detection
    Vijayalakshmi, M. N.
    Senthilvadivu, M.
    2016 INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT), VOL 3, 2015, : 688 - 693
  • [5] Seamless Detection: Unifying Salient Object Detection and Camouflaged Object Detection
    Liu, Yi
    Li, Chengxin
    Dong, Xiaohui
    Li, Lei
    Zhang, Dingwen
    Xu, Shoukun
    Han, Jungong
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 274
  • [6] Object detection
    Al Najjar, Mayssaa
    Ghantous, Milad
    Bayoumi, Magdy
    Lecture Notes in Electrical Engineering, 2014, 114 : 97 - 118
  • [7] Object Occlusion and Object Removal Detection
    Chai, Yung Joon
    Khor, Siak Wang
    Tay, Yong Haur
    FIFTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2013), 2013, 8878
  • [8] Multi-Dimensional Regular Expressions for Object Detection with LiDAR Imaging
    Torgersen, Todd C.
    Pauca, V. Paúl
    Plemmons, Robert J.
    Nikic, Dejan
    Wu, Jason
    Rand, Robert
    Mathematics and Visualization, 2018, 0 : 145 - 164
  • [9] Learning object motion patterns for anomaly detection and improved object detection
    Basharat, Arslan
    Gritai, Alexei
    Shah, Mubarak
    2008 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-12, 2008, : 1301 - 1308
  • [10] Federated Object Detection: Optimizing Object Detection Model with Federated Learning
    Yu, Peihua
    Liu, Yunfeng
    ICVISP 2019: PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON VISION, IMAGE AND SIGNAL PROCESSING, 2019,