Automatic Building Extraction on High-Resolution Remote Sensing Imagery Using Deep Convolutional Encoder-Decoder With Spatial Pyramid Pooling

被引:88
|
作者
Liu, Yaohui [1 ,2 ]
Gross, Lutz [2 ]
Li, Zhiqiang [3 ]
Li, Xiaoli [3 ]
Fan, Xiwei [1 ]
Qi, Wenhua [1 ]
机构
[1] China Earthquake Adm, Inst Geol, Beijing 100029, Peoples R China
[2] Univ Queensland, Sch Earth & Environm Sci, Brisbane, Qld 4072, Australia
[3] China Earthquake Networks Ctr, Beijing 100045, Peoples R China
关键词
Deep learning; high-resolution remote sensing imagery; building extraction; fully convolutional networks; encoder-decoder; SCALE; CLASSIFICATION;
D O I
10.1109/ACCESS.2019.2940527
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic extraction of buildings from remote sensing imagery plays a significant role in many applications, such as urban planning and monitoring changes to land cover. Various building segmentation methods have been proposed for visible remote sensing images, especially state-of-the-art methods based on convolutional neural networks (CNNs). However, high-accuracy building segmentation from high-resolution remote sensing imagery is still a challenging task due to the potentially complex texture of buildings in general and image background. Repeated pooling and striding operations used in CNNs reduce feature resolution causing a loss of detailed information. To address this issue, we propose a light-weight deep learning model integrating spatial pyramid pooling with an encoder-decoder structure. The proposed model takes advantage of a spatial pyramid pooling module to capture and aggregate multi-scale contextual information and of the ability of encoder-decoder networks to restore losses of information. The proposed model is evaluated on two publicly available datasets; the Massachusetts roads and buildings dataset and the INRIA Aerial Image Labeling Dataset. The experimental results on these datasets show qualitative and quantitative improvement against established image segmentation models, including SegNet, FCN, U-Net, Tiramisu, and FRRN. For instance, compared to the standard U-Net, the overall accuracy gain is 1.0% (0.913 vs. 0.904) and 3.6% (0.909 vs. 0.877) with a maximal increase of 3.6% in model-training time on these two datasets. These results demonstrate that the proposed model has the potential to deliver automatic building segmentation from high-resolution remote sensing images at an accuracy that makes it a useful tool for practical application scenarios.
引用
收藏
页码:128774 / 128786
页数:13
相关论文
共 50 条
  • [1] Dense Semantic Labeling with Atrous Spatial Pyramid Pooling and Decoder for High-Resolution Remote Sensing Imagery
    Wang, Yuhao
    Liang, Binxiu
    Ding, Meng
    Li, Jiangyun
    [J]. REMOTE SENSING, 2019, 11 (01)
  • [2] A Dual-attention Capsule Encoder-Decoder Network for Building Extraction from High Resolution Remote Sensing Imagery
    Xu, Zhengsen
    Guan, Haiyan
    Peng, Daifeng
    Yu, Yongtao
    Lei, Xiangda
    Zhao, Haohao
    [J]. National Remote Sensing Bulletin, 2022, 26 (08) : 1639 - 1649
  • [3] Automatic Building Extraction From High-Resolution Aerial Imagery via Fully Convolutional Encoder-Decoder Network With Non-Local Block
    Wang, Shengsheng
    Hou, Xiaowei
    Zhao, Xin
    [J]. IEEE ACCESS, 2020, 8 (08): : 7313 - 7322
  • [4] Urban building extraction based on information fusion-oriented deep encoder-decoder network in remote sensing imagery
    Zhang, Cheng
    Ma, Mingzhou
    He, Dan
    [J]. MULTIAGENT AND GRID SYSTEMS, 2022, 18 (3-4) : 279 - 294
  • [5] Deep convolutional encoder-decoder networks based on ensemble learning for semantic segmentation of high-resolution aerial imagery
    Zhu, Huming
    Liu, Chendi
    Li, Qiuming
    Zhang, Lingyun
    Wang, Libing
    Li, Sifan
    Jiao, Licheng
    Hou, Biao
    [J]. CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING, 2024, 6 (04) : 408 - 424
  • [6] Encoder-Decoder Network with Depthwise Atrous Spatial Pyramid Pooling for Automatic Brain Tumor Segmentation
    AboElenein, Nagwa M.
    Piao, Songhao
    Zhang, Zhehong
    [J]. NEURAL PROCESSING LETTERS, 2023, 55 (02) : 1697 - 1713
  • [7] Building extraction from VHR remote sensing imagery by combining an improved deep convolutional encoder-decoder architecture and historical land use vector map
    Feng, Wenqing
    Sui, Haigang
    Hua, Li
    Xu, Chuan
    Ma, Guorui
    Huang, Weiming
    [J]. INTERNATIONAL JOURNAL OF REMOTE SENSING, 2020, 41 (17) : 6595 - 6617
  • [8] Detection of excavated areas in high-resolution remote sensing imagery using combined hierarchical spatial pyramid pooling and VGGNet
    Cao, Yungang
    Zhang, Wei
    Bai, Xueqin
    Chen, Kai
    [J]. REMOTE SENSING LETTERS, 2021, 12 (12) : 1269 - 1280
  • [9] A Semantic Segmentation Method for High-resolution Remote Sensing Images Based on Encoder-Decoder
    Yang, Jingyu
    Zhao, Liang
    Dang, Jianwu
    Wang, Yangping
    Yue, Biao
    Gu, Zongliang
    [J]. 2022 TENTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA, CBD, 2022, : 98 - 103
  • [10] Road Extraction by Using Atrous Spatial Pyramid Pooling Integrated Encoder-Decoder Network and Structural Similarity Loss
    He, Hao
    Yang, Dongfang
    Wang, Shicheng
    Wang, Shuyang
    Li, Yongfei
    [J]. REMOTE SENSING, 2019, 11 (09)