Attentive Contexts for Object Detection

被引：181

作者：

Li, Jianan ^{[1
]}

Wei, Yunchao ^{[2
]}

Liang, Xiaodan ^{[3
]}

Dong, Jian ^{[4
]}

Xu, Tingfa ^{[1
]}

Feng, Jiashi ^{[4
]}

Yan, Shuicheng ^{[4
]}

机构：

[1] Beijing Inst Technol, Sch Opt Engn, Beijing 100081, Peoples R China

[2] Beijing Jiaotong Univ, Beijing 100044, Peoples R China

[3] Sun Yat Sen Univ, Guangzhou 510006, Guangdong, Peoples R China

[4] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 119077, Singapore

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2017年 / 19卷 / 05期

关键词：

Context; neural networks; object detection;

D O I：

10.1109/TMM.2016.2642789

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Modern deep neural network-based object detection methods typically classify candidate proposals using their interior features. However, global and local surrounding contexts that are believed to be valuable for object detection are not fully exploited by existing methods yet. In this work, we take a step towards understanding what is a robust practice to extract and utilize contextual information to facilitate object detection in practice. Specifically, we consider the following two questions: "how to identify useful global contextual information for detecting a certain object?" and "how to exploit local context surrounding a proposal for better inferring its contents?" We provide preliminary answers to these questions through developing a novel attention to context convolution neural network (AC-CNN)-based object detection model. AC-CNN effectively incorporates global and local contextual information into the region-based CNN (e.g., fast R-CNN and faster R-CNN) detection framework and provides better object detection performance. It consists of one attention-based global contextualized (AGC) subnetwork and one multi-scale local contextualized (MLC) subnetwork. To capture global context, the AGC subnetwork recurrently generates an attention map for an input image to highlight useful global contextual locations, through multiple stacked long short-term memory layers. For capturing surrounding local context, the MLC subnetwork exploits both the inside and outside contextual information of each specific proposal at multiple scales. The global and local context are then fused together for making the final decision for detection. Extensive experiments on PASCAL VOC 2007 and VOC 2012 well demonstrate the superiority of the proposed AC-CNN over well-established baselines.

引用

页码：944 / 954

页数：11

共 50 条

[21] An Improved Lightweight Network Using Attentive Feature Aggregation for Object Detection in Autonomous Driving
Kalgaonkar, Priyank
El-Sharkawy, Mohamed
[J]. JOURNAL OF LOW POWER ELECTRONICS AND APPLICATIONS, 2023, 13 (03)
[22] SAMNet: Stereoscopically Attentive Multi-Scale Network for Lightweight Salient Object Detection
Liu, Yun
Zhang, Xin-Yu
Bian, Jia-Wang
Zhang, Le
Cheng, Ming-Ming
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 3804 - 3814
[23] Object detection in agricultural contexts: A multiple resolution benchmark and comparison to human
Wosner, Omer
Farjon, Guy
Bar-Hillel, Aharon
[J]. COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2021, 189
[24] A2SPPNet: Attentive Atrous Spatial Pyramid Pooling Network for Salient Object Detection
Qiu, Yu
Liu, Yun
Chen, Yanan
Zhang, Jianwen
Zhu, Jinchao
Xu, Jing
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1991 - 2006
[25] Cross-modality attentive feature fusion for object detection in multispectral remote sensing imagery
Fang Qingyun
Wang Zhaokui
[J]. PATTERN RECOGNITION, 2022, 130
[26] AMVFNet: Attentive Multi-View Fusion Network for 3D Object Detection
Huang, Yuxiao
Huang, Zhicong
Zhao, Jingwen
Hu, Haifeng
Chen, Dihu
[J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2024, 22 (01)
[27] CTOD: Cross-Attentive Task-Alignment for One-Stage Object Detection
Yao, Ruilin
Rong, Yi
Huang, Qiangqiang
Xiong, Shengwu
[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (11) : 11507 - 11520
[28] Cross-modality attentive feature fusion for object detection in multispectral remote sensing imagery
Qingyun, Fang
Zhaokui, Wang
[J]. Pattern Recognition, 2022, 130
[29] Ann Quin, object relations, and the (in)attentive reader
Powell, Josh
[J]. TEXTUAL PRACTICE, 2021, 35 (02) : 247 - 263
[30] Pre-attentive and attentive object representations across saccades for saccade targets and bystanders
Germeys, F
Verfaillie, K
[J]. PERCEPTION, 2004, 33 : 51 - 51

← 1 2 3 4 5 →