Visual Grounding in Remote Sensing Images

被引:23
|
作者
Sun, Yuxi [1 ]
Feng, Shanshan [1 ]
Li, Xutao [1 ]
Ye, Yunming [1 ]
Kang, Jian [2 ]
Huang, Xu [1 ]
机构
[1] Harbin Inst Technol, Shenzhen, Peoples R China
[2] Soochow Univ, Suzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
dataset; object retrieval; visual grounding; remote sensing; referring expression;
D O I
10.1145/3503161.3548316
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Ground object retrieval from a large-scale remote sensing image is very important for lots of applications. We present a novel problem of visual grounding in remote sensing images. Visual grounding aims to locate the particular objects (in the form of the bounding box or segmentation mask) in an image by a natural language expression. The task already exists in the computer vision community. However, existing benchmark datasets and methods mainly focus on natural images rather than remote sensing images. Compared with natural images, remote sensing images contain large-scale scenes and the geographical spatial information of ground objects (e.g., longitude, latitude). The existing method cannot deal with these challenges. In this paper, we collect a new visual grounding dataset, called RSVG, and design a new method, namely GeoVG. In particular, the proposed method consists of a language encoder, image encoder, and fusion module. The language encoder is used to learn numerical geospatial relations and represent a complex expression as a geospatial relation graph. The image encoder is applied to learn large-scale remote sensing scenes with adaptive region attention. The fusion module is used to fuse the text and image feature for visual grounding. We evaluate the proposed method by comparing it to the state-of-the-art methods on RSVG. Experiments show that our method outperforms the previous methods on the proposed datasets. https://sunyuxi.github.io/publication/GeoVG
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Fusion of remote sensing images
    Mani, V. R. S.
    Arivazhagan, S.
    JOURNAL OF THE GEOLOGICAL SOCIETY OF INDIA, 2015, 86 (06) : 726 - 732
  • [22] Deconvolution of remote sensing images
    College of Automation, University of Electronic Science and Technology of China, Chengdu 610054, China
    不详
    Shu Ju Cai Ji Yu Chu Li, 2008, 2 (168-175):
  • [23] Fusion of remote sensing images
    V. R. S. Mani
    S. Arivazhagan
    Journal of the Geological Society of India, 2015, 86 : 726 - 732
  • [24] SEGMENTATION-GUIDED ATTENTION FOR VISUAL QUESTION ANSWERING FROM REMOTE SENSING IMAGES
    Tosato, Lucrezia
    Boussaid, Hichem
    Weissgerber, Flora
    Kurtz, Camille
    Wendling, Laurent
    Lobry, Sylvain
    IGARSS 2024-2024 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, IGARSS 2024, 2024, : 2750 - 2754
  • [25] Ship Detection for Optical Remote Sensing Images Based on Visual Attention Enhanced Network
    Bi, Fukun
    Hou, Jinyuan
    Chen, Liang
    Yang, Zhihua
    Wang, Yanping
    SENSORS, 2019, 19 (10)
  • [26] Quality Assessment of Remote Sensing Images Based on Deep Learning and Human Visual System
    Di, Liu
    Li Yingchun
    LASER & OPTOELECTRONICS PROGRESS, 2019, 56 (06)
  • [27] Region of interest extraction based on multiscale visual saliency analysis for remote sensing images
    Zhang, Yinggang
    Zhang, Libao
    Yu, Xianchuan
    JOURNAL OF APPLIED REMOTE SENSING, 2015, 9
  • [28] Comparative Study of Visual Attention Models with Human Eye Gaze in Remote Sensing Images
    Amudha, J.
    Radha, D.
    Deepa, A. S.
    PROCEEDING OF THE THIRD INTERNATIONAL SYMPOSIUM ON WOMEN IN COMPUTING AND INFORMATICS (WCI-2015), 2015, : 445 - 450
  • [29] VISUAL SALIENCY ANALYSIS FOR COMMON REGION OF INTEREST DETECTION IN MULTIPLE REMOTE SENSING IMAGES
    Zhang, Libao
    Sun, Qiaoyue
    Sun, Yang
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 2316 - 2320
  • [30] A Campus Landscape Visual Evaluation Method Integrating PixScape and UAV Remote Sensing Images
    Song, Lili
    Wu, Moyu
    BUILDINGS, 2025, 15 (01)