Learning Spatiotemporal Features for Video Semantic Segmentation Using 3D Convolutional Neural Networks

被引:0
|
作者
Chen, Jiamin [1 ]
Wang, Mingchen [1 ]
Jiang, Shang [1 ]
Huang, Bin [1 ]
Sun, Hongbo [1 ]
机构
[1] Beijing Inst Technol, Sch Appl Sci & Civil Engn, Zhuhai, Peoples R China
关键词
Video semantic segmentation; 3D networks; 3D HRNetV2;
D O I
10.1109/ISCSIC57216.2022.00023
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, significant progress has been made in still image segmentation. However, applying these advanced algorithms to each video frame requires extensive calculation. In this paper, we made two main contributions. The first contribution is a new dataset, we made a human semantic segmentation video dataset based on the Refer-YubeVOS dataset. It provides a benchmark for evaluating video semantic segmentation models. The second contribution is to propose a video semantic segmentation architecture suitable for spatiotemporal feature learning and a method for modifying 2D networks into 3D networks. The trials showed that the 3D network outperforms the 2D network on our dataset. And it is concluded that 3D HRNetV2 has the best performance, with an mIoUv of 61.72%, 14.89% higher than 2D HRNetV2.
引用
收藏
页码:55 / 62
页数:8
相关论文
共 50 条
  • [1] Learning Spatiotemporal Features with 3D Convolutional Networks
    Du Tran
    Bourdev, Lubomir
    Fergus, Rob
    Torresani, Lorenzo
    Paluri, Manohar
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
  • [2] Learning Spatiotemporal Features for Infrared Action Recognition with 3D Convolutional Neural Networks
    Jiang, Zhuolin
    Rozgic, Viktor
    Adali, Sancar
    [J]. 2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 309 - 317
  • [3] Using 3D Convolutional Neural Networks to Learn Spatiotemporal Features for Automatic Surgical Gesture Recognition in Video
    Funke, Isabel
    Bodenstedt, Sebastian
    Oehme, Florian
    von Bechtolsheim, Felix
    Weitz, Juergen
    Speidel, Stefanie
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT V, 2019, 11768 : 467 - 475
  • [4] Improving Semantic Segmentation of 3D Medical Images on 3D Convolutional Neural Networks
    Marquez Herrera, Alejandra
    Cuadros-Vargas, Alex J.
    Pedrini, Helio
    [J]. 2019 XLV LATIN AMERICAN COMPUTING CONFERENCE (CLEI 2019), 2019,
  • [5] Video Steganography Using 3D Convolutional Neural Networks
    Abdolmohammadi, Mahdi
    Toroghi, Rahil Mahdian
    Bastanfard, Azam
    [J]. PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2020, 1144 : 149 - 161
  • [6] Segmentation of tomography datasets using 3D convolutional neural networks
    James, Jim
    Pruyne, Nathan
    Stan, Tiberiu
    Schwarting, Marcus
    Yeom, Jiwon
    Hong, Seungbum
    Voorhees, Peter
    Blaiszik, Ben
    Foster, Ian
    [J]. COMPUTATIONAL MATERIALS SCIENCE, 2023, 216
  • [7] Violence Detection Using Spatiotemporal Features with 3D Convolutional Neural Network
    Ullah, Fath U. Min
    Ullah, Amin
    Muhammad, Khan
    Ul Haq, Ijaz
    Baik, Sung Wook
    [J]. SENSORS, 2019, 19 (11)
  • [8] Learning Spatiotemporal Features of CSI for Indoor Localization With Dual-Stream 3D Convolutional Neural Networks
    Jing, Yuan
    Hao, Jinshan
    Li, Peng
    [J]. IEEE ACCESS, 2019, 7 : 147571 - 147585
  • [9] Violence Detection in Video by Using 3D Convolutional Neural Networks
    Ding, Chunhui
    Fan, Shouke
    Zhu, Ming
    Feng, Weiguo
    Jia, Baozhi
    [J]. ADVANCES IN VISUAL COMPUTING (ISVC 2014), PT II, 2014, 8888 : 551 - 558
  • [10] 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks
    Graham, Benjamin
    Engelcke, Martin
    van der Maaten, Laurens
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 9224 - 9232