Visual Grounding in Remote Sensing Images

被引：23

作者：

Sun, Yuxi ^{[1
]}

Feng, Shanshan ^{[1
]}

Li, Xutao ^{[1
]}

Ye, Yunming ^{[1
]}

Kang, Jian ^{[2
]}

Huang, Xu ^{[1
]}

机构：

[1] Harbin Inst Technol, Shenzhen, Peoples R China

[2] Soochow Univ, Suzhou, Peoples R China

来源：

PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年

基金：

中国国家自然科学基金;

关键词：

dataset; object retrieval; visual grounding; remote sensing; referring expression;

D O I：

10.1145/3503161.3548316

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Ground object retrieval from a large-scale remote sensing image is very important for lots of applications. We present a novel problem of visual grounding in remote sensing images. Visual grounding aims to locate the particular objects (in the form of the bounding box or segmentation mask) in an image by a natural language expression. The task already exists in the computer vision community. However, existing benchmark datasets and methods mainly focus on natural images rather than remote sensing images. Compared with natural images, remote sensing images contain large-scale scenes and the geographical spatial information of ground objects (e.g., longitude, latitude). The existing method cannot deal with these challenges. In this paper, we collect a new visual grounding dataset, called RSVG, and design a new method, namely GeoVG. In particular, the proposed method consists of a language encoder, image encoder, and fusion module. The language encoder is used to learn numerical geospatial relations and represent a complex expression as a geospatial relation graph. The image encoder is applied to learn large-scale remote sensing scenes with adaptive region attention. The fusion module is used to fuse the text and image feature for visual grounding. We evaluate the proposed method by comparing it to the state-of-the-art methods on RSVG. Experiments show that our method outperforms the previous methods on the proposed datasets. https://sunyuxi.github.io/publication/GeoVG

引用

页数：9

共 50 条

[21] Fusion of remote sensing images
Mani, V. R. S.
Arivazhagan, S.
JOURNAL OF THE GEOLOGICAL SOCIETY OF INDIA, 2015, 86 (06) : 726 - 732
[22] Deconvolution of remote sensing images
College of Automation, University of Electronic Science and Technology of China, Chengdu 610054, China
不详
Shu Ju Cai Ji Yu Chu Li, 2008, 2 (168-175):
[23] Fusion of remote sensing images
V. R. S. Mani
S. Arivazhagan
Journal of the Geological Society of India, 2015, 86 : 726 - 732
[24] SEGMENTATION-GUIDED ATTENTION FOR VISUAL QUESTION ANSWERING FROM REMOTE SENSING IMAGES
Tosato, Lucrezia
Boussaid, Hichem
Weissgerber, Flora
Kurtz, Camille
Wendling, Laurent
Lobry, Sylvain
IGARSS 2024-2024 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, IGARSS 2024, 2024, : 2750 - 2754
[25] Ship Detection for Optical Remote Sensing Images Based on Visual Attention Enhanced Network
Bi, Fukun
Hou, Jinyuan
Chen, Liang
Yang, Zhihua
Wang, Yanping
SENSORS, 2019, 19 (10)
[26] Quality Assessment of Remote Sensing Images Based on Deep Learning and Human Visual System
Di, Liu
Li Yingchun
LASER & OPTOELECTRONICS PROGRESS, 2019, 56 (06)
[27] Region of interest extraction based on multiscale visual saliency analysis for remote sensing images
Zhang, Yinggang
Zhang, Libao
Yu, Xianchuan
JOURNAL OF APPLIED REMOTE SENSING, 2015, 9
[28] Comparative Study of Visual Attention Models with Human Eye Gaze in Remote Sensing Images
Amudha, J.
Radha, D.
Deepa, A. S.
PROCEEDING OF THE THIRD INTERNATIONAL SYMPOSIUM ON WOMEN IN COMPUTING AND INFORMATICS (WCI-2015), 2015, : 445 - 450
[29] VISUAL SALIENCY ANALYSIS FOR COMMON REGION OF INTEREST DETECTION IN MULTIPLE REMOTE SENSING IMAGES
Zhang, Libao
Sun, Qiaoyue
Sun, Yang
2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 2316 - 2320
[30] A Campus Landscape Visual Evaluation Method Integrating PixScape and UAV Remote Sensing Images
Song, Lili
Wu, Moyu
BUILDINGS, 2025, 15 (01)

← 1 2 3 4 5 →