TRCDNet: A Transformer Network for Video Cloud Detection

被引:2
|
作者
Luo, Chen [1 ,2 ]
Feng, Shanshan [1 ,2 ]
Quan, Yingling [3 ]
Ye, Yunming [1 ,2 ]
Li, Xutao [1 ,2 ]
Xu, Yong [3 ]
Zhang, Baoquan [3 ]
Chen, Zhihao [3 ]
机构
[1] Harbin Inst Technol, Dept Comp Sci, Shenzhen 518055, Peoples R China
[2] Harbin Inst Technol, Shenzhen Key Lab Internet Informat Collaborat, Shenzhen, Peoples R China
[3] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518071, Peoples R China
关键词
Cloud detection on geostationary satellite images; Fengyun-4A satellites; video cloud detection; SHADOW DETECTION; ALGORITHMS; FEATURES; IMAGERY; FUSION;
D O I
10.1109/TGRS.2023.3288543
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
In remote-sensing image (RSI) preprocessing steps, detecting and removing cloudy areas is a critical task. Recently, cloud detection methods based on deep neural networks achieve outstanding performance over traditional methods. Current approaches mostly focus on cloud detection on a single image captured by polar-orbiting satellites. However, there is another type of meteorological satellite-geostationary satellite, which can capture temporal consecutive frames of a particular location. Therefore, the cloud detection task targeting a geostationary satellite can be treated as a video cloud detection task. And in addition to extracting features on a single image, extracting and making full use of the relations between sequential frames is also important. To tackle this problem, we design a deep-learning video cloud detection model: transformer network for video cloud detection (TRCDNet). The proposed network is based on the encoder-decoder structure. In the encoder, the module ContextGhostLayer is proposed to encode more semantic information to tackle challenging problems like thin clouds in RSIs. Besides, we design a transformer-based video sequence transformer (VSTR) block. Based on the attention mechanism, VSTR can fully extract the across-frame relations. In the proposed decoder, the cloud masks are recovered gradually to the same scale as the input image. To evaluate the methods, we create a Video Cloud Detection dataset based on the captured videos from Fengyun 4 (FY-4) satellite: Fengyun4aCloud. Extensive experiments of current cloud detection methods, semantic segmentation methods, and video semantic segmentation (VSS) methods indicate that the designed TRCDNet achieves state-of-art performance in video cloud detection.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] TransAnomaly: Video Anomaly Detection Using Video Vision Transformer
    Yuan, Hongchun
    Cai, Zhenyu
    Zhou, Hui
    Wang, Yue
    Chen, Xiangzhi
    IEEE ACCESS, 2021, 9 : 123977 - 123986
  • [22] Video Enhancement Network Based on CNN and Transformer
    YUAN Lang
    HUI Chen
    WU Yanfeng
    LIAO Ronghua
    JIANG Feng
    GAO Ying
    ZTE Communications, 2024, 22 (04) : 78 - 88
  • [23] A hierarchical Transformer network for smoke video recognition
    Cheng, Guangtao
    Xian, Baoyi
    Liu, Yifan
    Chen, Xue
    Hu, Lianjun
    Song, Zhanjie
    DIGITAL SIGNAL PROCESSING, 2025, 158
  • [24] Data Efficient Video Transformer for Violence Detection
    Abdali, Almamon Rasool
    2021 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION, NETWORKS AND SATELLITE (COMNETSAT 2021), 2021, : 195 - 199
  • [25] Video Transformer for Deepfake Detection with Incremental Learning
    Khan, Sohail Ahmed
    Dai, Hang
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1821 - 1828
  • [26] TubeR: Tubelet Transformer for Video Action Detection
    Zhao, Jiaojiao
    Zhang, Yanyi
    Li, Xinyu
    Chen, Hao
    Shuai, Bing
    Xu, Mingze
    Liu, Chunhui
    Kundu, Kaustav
    Xiong, Yuanjun
    Modolo, Davide
    Marsic, Ivan
    Snoek, Cees G. M.
    Tighe, Joseph
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13588 - 13597
  • [27] Deepfake Video Detection with Spatiotemporal Dropout Transformer
    Zhang, Daichi
    Lin, Fanzhao
    Hua, Yingying
    Wang, Pengju
    Zeng, Dan
    Ge, Shiming
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5833 - 5841
  • [28] TDS-Net: Transformer enhanced dual-stream network for video Anomaly Detection
    Hussain, Adnan
    Ullah, Waseem
    Khan, Noman
    Khan, Zulfiqar Ahmad
    Kim, Min Je
    Baik, Sung Wook
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 256
  • [29] Transformer-Based Interactive Multi-Modal Attention Network for Video Sentiment Detection
    Zhuang, Xuqiang
    Liu, Fangai
    Hou, Jian
    Hao, Jianhua
    Cai, Xiaohong
    NEURAL PROCESSING LETTERS, 2022, 54 (03) : 1943 - 1960
  • [30] Transformer-Based Interactive Multi-Modal Attention Network for Video Sentiment Detection
    Xuqiang Zhuang
    Fangai Liu
    Jian Hou
    Jianhua Hao
    Xiaohong Cai
    Neural Processing Letters, 2022, 54 : 1943 - 1960