Cross-View Semantic Segmentation for Sensing Surroundings

被引：153

作者：

Pan, Bowen ^{[1
]}

Sun, Jiankai ^{[2
]}

Leung, Ho Yin Tiga ^{[2
]}

Andonian, Alex ^{[1
]}

Zhou, Bolei ^{[2
]}

机构：

[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA

[2] Chinese Univ Hong Kong, Dept Informat Engn, Hong Kong, Peoples R China

来源：

IEEE ROBOTICS AND AUTOMATION LETTERS | 2020年 / 5卷 / 03期

关键词：

Semantic scene understanding; deep learning for visual perception; visual learning; visual-based navigation; computer vision for other robotic applications;

D O I：

10.1109/LRA.2020.3004325

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Sensing surroundings plays a crucial role in human spatial perception, as it extracts the spatial configuration of objects as well as the free space from the observations. To facilitate the robot perception with such a surrounding sensing capability, we introduce a novel visual task called Cross-view Semantic Segmentation as well as a framework named View Parsing Network (VPN) to address it. In the cross-view semantic segmentation task, the agent is trained to parse the first-view observations into a top-down-view semantic map indicating the spatial location of all the objects at pixel-level. The main issue of this task is that we lack the real-world annotations of top-down-view data. To mitigate this, we train the VPN in 3D graphics environment and utilize the domain adaptation technique to transfer it to handle real-world data. We evaluate our VPN on both synthetic and real-world agents. The experimental results show that our model can effectively make use of the information from different views and multi-modalities to understanding spatial information. Our further experiment on a LoCoBot robot shows that our model enables the surrounding sensing capability from 2D image input. Code and demo videos can be found at https://view-parsing-network.github.io.

引用

页码：4867 / 4873

页数：7

共 50 条

[21] Cross-view Convolutional Networks
Jacobs, Nathan
Workman, Scott
Zhai, Menghua
2016 IEEE APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP (AIPR), 2016,
[22] CAS-Net: Cross-View Aligned Segmentation by Graph Representation of Knees
Zhuang, Zixu
Wang, Xin
Wang, Sheng
Shen, Zhenrong
Zhao, Xiangyu
Liu, Mengjun
Xue, Zhong
Shen, Dinggang
Zhang, Lichi
Wang, Qian
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT IV, 2023, 14223 : 110 - 119
[23] Cross-View Image Geolocalization
Lin, Tsung-Yi
Belongie, Serge
Hays, James
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 891 - 898
[24] An Efficient Method based on Multi-view Semantic Alignment for Cross-view Geo-localization
Wang, Yifeng
Xia, Yamei
Lu, Tianbo
Zhang, Xiaoyan
Yao, Wenbin
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[25] Cross-view discrepancy-dependency network for volumetric medical image segmentation
Zhong, Shengzhou
Wang, Wenxu
Feng, Qianjin
Zhang, Yu
Ning, Zhenyuan
MEDICAL IMAGE ANALYSIS, 2025, 99
[26] X-Align++: cross-modal cross-view alignment for Bird’s-eye-view segmentation
Shubhankar Borse
Marvin Klingner
Varun Ravi
Hong Cai
Abdulaziz Almuzairee
Senthil Yogamani
Fatih Porikli
Machine Vision and Applications, 2023, 34
[27] X-Align: Cross-Modal Cross-View Alignment for Bird's-Eye-View Segmentation
Borse, Shubhankar
Klingner, Marvin
Kumar, Varun Ravi
Cai, Hong
Almuzairee, Abdulaziz
Yogamani, Senthil
Porikli, Fatih
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 3286 - 3296
[28] SimH: A Supervised Cross-View Hashing Framework Preserving Semantic Similarities in Hamming Space
Xia Shijun
Gu Zhongyuan
Ge Shengbin
Hu Weijin
8TH INTERNATIONAL CONFERENCE ON INTERNET MULTIMEDIA COMPUTING AND SERVICE (ICIMCS2016), 2016, : 217 - 222
[29] CVLNet: Cross-view Semantic Correspondence Learning for Video-Based Camera Localization
Shi, Yujiao
Yu, Xin
Wang, Shan
Li, Hongdong
COMPUTER VISION - ACCV 2022, PT I, 2023, 13841 : 123 - 141
[30] Cross-View Geo-Localization via Learning Correspondence Semantic Similarity Knowledge
Chen, Guanli
Huang, Guoheng
Yuan, Xiaochen
Chen, Xuhang
Zhong, Guo
Pun, Chi-Man
MULTIMEDIA MODELING, MMM 2025, PT I, 2025, 15520 : 220 - 233

← 1 2 3 4 5 →