Towards Interpretable Object Detection by Unfolding Latent Structures

被引：14

作者：

Wu, Tianfu ^{[1
,2
]}

Song, Xi

机构：

[1] NC State Univ, Dept ECE, Raleigh, NC 27695 USA

[2] NC State Univ, Visual Narrat Initiat, Raleigh, NC 27695 USA

来源：

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) | 2019年

关键词：

D O I：

10.1109/ICCV.2019.00613

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper first proposes a method of formulating model interpretability in visual understanding tasks based on the idea of unfolding latent structures. It then presents a case study in object detection using popular two-stage regionbased convolutional network (i.e., R-CNN) detection systems [19, 50, 7, 23]. The proposed method focuses on weakly-supervised extractive rationale generation, that is learning to unfold latent discriminative part configurations of object instances automatically and simultaneously in detection without using any supervision for part configurations. It utilizes a top-down hierarchical and compositional grammar model embedded in a directed acyclic AND-OR Graph (AOG) to explore and unfold the space of latent part configurations of regions of interest (RoIs). It presents an AOGParsing operator that seamlessly integrates with the RoIPooling [19]/RoIAlign [23] operator widely used in R-CNN and is trained end-to-end. In object detection, a bounding box is interpreted by the best parse tree derived from the AOG on-the-fly, which is treated as the qualitatively extractive rationale generated for interpreting detection. In experiments, Faster R-CNN [50] is used to test the proposed method on the PASCAL VOC 2007 [13] and the COCO 2017 [40] object detection datasets. The experimental results show that the proposed method can compute promising latent structures without hurting the performance. The code and pretrained models are available at https://github.com/ iVMCL/iRCNN.

引用

页码：6032 / 6042

页数：11

共 50 条

[1] Towards Real Time Interpretable Object Detection for UAV Platform by Saliency Maps
Hogan, Maxwell
Aouf, Nabil
2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (IEEE-ROBIO 2021), 2021, : 1178 - 1183
[2] TOWARDS HUMAN-LIKE INTERPRETABLE OBJECT DETECTION VIA SPATIAL RELATION ENCODING
Kim, Jung Uk
Park, Sungjune
Ro, Yong Man
2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 3284 - 3288
[3] Interpretable Networks for Hyperspectral Anomaly Detection: A Deep Unfolding Solution
Li, Chenyu
Zhang, Bing
Hong, Danfeng
Yao, Jing
Jia, Xiuping
Plaza, Antonio
Chanussot, Jocelyn
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[4] Towards Interpretable Video Anomaly Detection
Doshi, Keval
Yilmaz, Yasin
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2654 - 2663
[5] Towards Interpretable Natural Language Understanding with Explanations as Latent Variables
Zhou, Wangchunshu
Hu, Jinyi
Zhang, Hanlin
Liang, Xiaodan
Sun, Maosong
Xiong, Chenyan
Tang, Jian
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[6] BLasso for object categorization and retrieval: Towards interpretable visual models
Rebai, Ahmed
Joly, Alexis
Boujemaa, Nozha
PATTERN RECOGNITION, 2012, 45 (06) : 2377 - 2389
[7] Latent Hough Transform for Object Detection
Razavi, Nima
Gall, Juergen
Kohli, Pushmeet
van Gool, Luc
COMPUTER VISION - ECCV 2012, PT III, 2012, 7574 : 312 - 325
[8] LRR-Net: An Interpretable Deep Unfolding Network for Hyperspectral Anomaly Detection
Li, Chenyu
Zhang, Bing
Hong, Danfeng
Yao, Jing
Chanussot, Jocelyn
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[9] Towards Dependable Object Detection
Selvaraj, Nithish Muthuchamy
Muhammad, Ilyas
Cheah, Chien Chern
IECON 2020: THE 46TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2020, : 523 - 528
[10] The object detection logic of latent variable technologies
Michael Maraun
Quality & Quantity, 2017, 51 : 239 - 259

← 1 2 3 4 5 →