SCOTCH and SODA: A Transformer Video Shadow Detection Framework

被引:11
|
作者
Liu, Lihao [1 ]
Prost, Jean [2 ]
Zhu, Lei [3 ,4 ]
Papadakis, Nicolas [2 ]
Lio, Pietro [1 ]
Schonlieb, Carola-Bibiane [1 ]
Aviles-Rivero, Angelica I. [1 ]
机构
[1] Univ Cambridge, Cambridge, England
[2] Univ Bordeaux, CNRS, Bordeaux INP, IMB,UMR 5251, F-33400 Talence, France
[3] Hong Kong Univ Sci & Technol Guangzhou, Hong Kong, Peoples R China
[4] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
基金
英国工程与自然科学研究理事会;
关键词
D O I
10.1109/CVPR52729.2023.01007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Shadows in videos are difficult to detect because of the large shadow deformation between frames. In this work, we argue that accounting for shadow deformation is essential when designing a video shadow detection method. To this end, we introduce the shadow deformation attention trajectory (SODA), a new type of video self-attention module, specially designed to handle the large shadow deformations in videos. Moreover, we present a new shadow contrastive learning mechanism (SCOTCH) which aims at guiding the network to learn a unified shadow representation from massive positive shadow pairs across different videos. We demonstrate empirically the effectiveness of our two contributions in an ablation study. Furthermore, we show that SCOTCH and SODA significantly outperforms existing techniques for video shadow detection. Code is available at the project page: https:// lihaoliucambridge.github.io/scotch_and_soda/
引用
收藏
页码:10449 / 10458
页数:10
相关论文
共 50 条
  • [21] A framework for background detection in video
    Qing, LY
    Wang, WQ
    Huang, TJ
    Gao, W
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2002, PROCEEDING, 2002, 2532 : 799 - 805
  • [22] Data Efficient Video Transformer for Violence Detection
    Abdali, Almamon Rasool
    2021 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION, NETWORKS AND SATELLITE (COMNETSAT 2021), 2021, : 195 - 199
  • [23] Video Transformer for Deepfake Detection with Incremental Learning
    Khan, Sohail Ahmed
    Dai, Hang
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1821 - 1828
  • [24] TRCDNet: A Transformer Network for Video Cloud Detection
    Luo, Chen
    Feng, Shanshan
    Quan, Yingling
    Ye, Yunming
    Li, Xutao
    Xu, Yong
    Zhang, Baoquan
    Chen, Zhihao
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [25] TubeR: Tubelet Transformer for Video Action Detection
    Zhao, Jiaojiao
    Zhang, Yanyi
    Li, Xinyu
    Chen, Hao
    Shuai, Bing
    Xu, Mingze
    Liu, Chunhui
    Kundu, Kaustav
    Xiong, Yuanjun
    Modolo, Davide
    Marsic, Ivan
    Snoek, Cees G. M.
    Tighe, Joseph
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13588 - 13597
  • [26] Deepfake Video Detection with Spatiotemporal Dropout Transformer
    Zhang, Daichi
    Lin, Fanzhao
    Hua, Yingying
    Wang, Pengju
    Zeng, Dan
    Ge, Shiming
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5833 - 5841
  • [27] A Survey on Shadow Detection and Removal in Images and Video Sequences
    Tiwari, Arti
    Singh, Pradeep Kumar
    Amin, Sobia
    2016 6TH INTERNATIONAL CONFERENCE - CLOUD SYSTEM AND BIG DATA ENGINEERING (CONFLUENCE), 2016, : 518 - 523
  • [28] Moving Object Detection and Shadow Removal in Video Surveillance
    Yan, Tinggui
    Hu, Shaohua
    Su, Xiaofeng
    He, Xinhua
    PROCEEDINGS OF 2016 10TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT & APPLICATIONS (SKIMA), 2016, : 3 - 8
  • [29] Detection of Human Fall in Video Using Shadow Information
    Chen, Yie-Tarng
    Lin, You-Rong
    Fang, Wen-Hsien
    2012 IEEE BIOMEDICAL CIRCUITS AND SYSTEMS CONFERENCE (BIOCAS): INTELLIGENT BIOMEDICAL ELECTRONICS AND SYSTEM FOR BETTER LIFE AND BETTER ENVIRONMENT, 2012, : 284 - 287
  • [30] Background subtraction and shadow detection in grayscale video sequences
    Jacques, JCS
    Jung, CR
    Musse, SR
    SIBGRAPI 2005: XVIII BRAZILIAN SYMPOSIUM ON COMPUTER GRAPHICS AND IMAGE PROCESSING, CONFERENCE PROCEEDINGS, 2005, : 189 - 196