Cross-View Semantic Segmentation for Sensing Surroundings

被引:153
|
作者
Pan, Bowen [1 ]
Sun, Jiankai [2 ]
Leung, Ho Yin Tiga [2 ]
Andonian, Alex [1 ]
Zhou, Bolei [2 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[2] Chinese Univ Hong Kong, Dept Informat Engn, Hong Kong, Peoples R China
关键词
Semantic scene understanding; deep learning for visual perception; visual learning; visual-based navigation; computer vision for other robotic applications;
D O I
10.1109/LRA.2020.3004325
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Sensing surroundings plays a crucial role in human spatial perception, as it extracts the spatial configuration of objects as well as the free space from the observations. To facilitate the robot perception with such a surrounding sensing capability, we introduce a novel visual task called Cross-view Semantic Segmentation as well as a framework named View Parsing Network (VPN) to address it. In the cross-view semantic segmentation task, the agent is trained to parse the first-view observations into a top-down-view semantic map indicating the spatial location of all the objects at pixel-level. The main issue of this task is that we lack the real-world annotations of top-down-view data. To mitigate this, we train the VPN in 3D graphics environment and utilize the domain adaptation technique to transfer it to handle real-world data. We evaluate our VPN on both synthetic and real-world agents. The experimental results show that our model can effectively make use of the information from different views and multi-modalities to understanding spatial information. Our further experiment on a LoCoBot robot shows that our model enables the surrounding sensing capability from 2D image input. Code and demo videos can be found at https://view-parsing-network.github.io.
引用
收藏
页码:4867 / 4873
页数:7
相关论文
共 50 条
  • [21] Cross-view Convolutional Networks
    Jacobs, Nathan
    Workman, Scott
    Zhai, Menghua
    2016 IEEE APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP (AIPR), 2016,
  • [22] CAS-Net: Cross-View Aligned Segmentation by Graph Representation of Knees
    Zhuang, Zixu
    Wang, Xin
    Wang, Sheng
    Shen, Zhenrong
    Zhao, Xiangyu
    Liu, Mengjun
    Xue, Zhong
    Shen, Dinggang
    Zhang, Lichi
    Wang, Qian
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT IV, 2023, 14223 : 110 - 119
  • [23] Cross-View Image Geolocalization
    Lin, Tsung-Yi
    Belongie, Serge
    Hays, James
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 891 - 898
  • [24] An Efficient Method based on Multi-view Semantic Alignment for Cross-view Geo-localization
    Wang, Yifeng
    Xia, Yamei
    Lu, Tianbo
    Zhang, Xiaoyan
    Yao, Wenbin
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [25] Cross-view discrepancy-dependency network for volumetric medical image segmentation
    Zhong, Shengzhou
    Wang, Wenxu
    Feng, Qianjin
    Zhang, Yu
    Ning, Zhenyuan
    MEDICAL IMAGE ANALYSIS, 2025, 99
  • [26] X-Align++: cross-modal cross-view alignment for Bird’s-eye-view segmentation
    Shubhankar Borse
    Marvin Klingner
    Varun Ravi
    Hong Cai
    Abdulaziz Almuzairee
    Senthil Yogamani
    Fatih Porikli
    Machine Vision and Applications, 2023, 34
  • [27] X-Align: Cross-Modal Cross-View Alignment for Bird's-Eye-View Segmentation
    Borse, Shubhankar
    Klingner, Marvin
    Kumar, Varun Ravi
    Cai, Hong
    Almuzairee, Abdulaziz
    Yogamani, Senthil
    Porikli, Fatih
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 3286 - 3296
  • [28] SimH: A Supervised Cross-View Hashing Framework Preserving Semantic Similarities in Hamming Space
    Xia Shijun
    Gu Zhongyuan
    Ge Shengbin
    Hu Weijin
    8TH INTERNATIONAL CONFERENCE ON INTERNET MULTIMEDIA COMPUTING AND SERVICE (ICIMCS2016), 2016, : 217 - 222
  • [29] CVLNet: Cross-view Semantic Correspondence Learning for Video-Based Camera Localization
    Shi, Yujiao
    Yu, Xin
    Wang, Shan
    Li, Hongdong
    COMPUTER VISION - ACCV 2022, PT I, 2023, 13841 : 123 - 141
  • [30] Cross-View Geo-Localization via Learning Correspondence Semantic Similarity Knowledge
    Chen, Guanli
    Huang, Guoheng
    Yuan, Xiaochen
    Chen, Xuhang
    Zhong, Guo
    Pun, Chi-Man
    MULTIMEDIA MODELING, MMM 2025, PT I, 2025, 15520 : 220 - 233