Satellite Video Multi-Label Scene Classification With Spatial and Temporal Feature Cooperative Encoding: A Benchmark Dataset and Method

被引:0
|
作者
Guo, Weilong [1 ]
Li, Shengyang [2 ]
Chen, Feixiang [3 ]
Sun, Yuhan [2 ]
Gu, Yanfeng [4 ]
机构
[1] Chinese Acad Sci, Key Lab Space Utilizat & Technol, Beijing 100094, Peoples R China
[2] Univ Chinese Acad Sci, Sch Aeronaut & Astronaut, Beijing 100049, Peoples R China
[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
[4] Harbin Inst Technol, Sch Elect & Informat Engn, Harbin 150001, Peoples R China
关键词
Satellite video; multi-label scene classification; spatial and temporal; feature encoding;
D O I
10.1109/TIP.2024.3374100
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Satellite video multi-label scene classification predicts semantic labels of multiple ground contents to describe a given satellite observation video, which plays an important role in applications like ocean observation, smart cities, et al. However, the lack of a high-quality and large-scale dataset prevents further improvement of the task. And existing methods on general videos have the difficulty to represent the local details of ground contents when directly applied to the satellite videos. In this paper, our contributions include (1) we develop the first publicly available and large-scale satellite video multi-label scene classification dataset. It consists of 18 classes of static and dynamic ground contents, 3549 videos, and 141960 frames. (2) we propose a baseline method with the novel Spatial and Temporal Feature Cooperative Encoding (STFCE). It exploits the relations between local spatial and temporal features, and models long-term motion information hidden in inter-frame variations. In this way, it can enhance features of local details and obtain the powerful video-scene-level feature representation, which raises the classification performance effectively. Experimental results show that our proposed STFCE outperforms 13 state-of-the-art methods with a global average precision (GAP) of 0.8106 and the careful fusion and joint learning of the spatial, temporal, and motion features are beneficial to achieve a more robust and accurate model. Moreover, benchmarking results show that the proposed dataset is very challenging and we hope it could promote further development of the satellite video multi-label scene classification task.
引用
收藏
页码:2238 / 2251
页数:14
相关论文
共 27 条
  • [1] Micro-video multi-label classification method based on multi-modal feature encoding
    Jing P.
    Li Y.
    Su Y.
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2022, 49 (04): : 109 - 117
  • [2] TRANSTL: SPATIAL-TEMPORAL LOCALIZATION TRANSFORMER FOR MULTI-LABEL VIDEO CLASSIFICATION
    Wu, Hongjun
    Li, Mengzhu
    Liu, Yongcheng
    Liu, Hongzhe
    Xu, Cheng
    Li, Xuewei
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1965 - 1969
  • [3] Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark
    Haurum, Joakim Bruslund
    Moeslund, Thomas B.
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 13451 - 13462
  • [4] A lazy feature selection method for multi-label classification
    Pereira, Rafael B.
    Plastino, Alexandre
    Zadrozny, Bianca
    Merschmann, Luiz H. C.
    INTELLIGENT DATA ANALYSIS, 2021, 25 (01) : 21 - 34
  • [5] Multi-label Classification via Feature-aware Implicit Label Space Encoding
    Lin, Zijia
    Ding, Guiguang
    Hu, Mingqing
    Wang, Jianmin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 325 - 333
  • [6] Approach for Video Classification with Multi-label on YouTube-8M Dataset
    Shin, Kwangsoo
    Jeon, Junhyeong
    Lee, Seungbin
    Lim, Boyoung
    Jeong, Minsoo
    Nang, Jongho
    COMPUTER VISION - ECCV 2018 WORKSHOPS, PT IV, 2019, 11132 : 317 - 324
  • [7] MLRSNet: A multi-label high spatial resolution remote sensing dataset for semantic scene understanding
    Qi, Xiaoman
    Zhu, Panpan
    Wang, Yuebin
    Zhang, Liqiang
    Peng, Junhuan
    Wu, Mengfan
    Chen, Jialong
    Zhao, Xudong
    Zang, Ning
    Mathiopoulos, P. Takis
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2020, 169 (169) : 337 - 350
  • [8] TreeSatAI Benchmark Archive: a multi-sensor, multi-label dataset for tree species classification in remote sensing
    Ahlswede, Steve
    Schulz, Christian
    Gava, Christiano
    Helber, Patrick
    Bischke, Benjamin
    Foerster, Michael
    Arias, Florencia
    Hees, Joern
    Demir, Beguem
    Kleinschmit, Birgit
    EARTH SYSTEM SCIENCE DATA, 2023, 15 (02) : 681 - 695
  • [9] Quick scene classification method based on compact encoding of video feature sequence
    Nagasaka, Akio
    Miyatake, Takafumi
    Systems and Computers in Japan, 2000, 31 (01) : 102 - 108
  • [10] Relational large scale multi-label classification method for video categorization
    Wojciech Indyk
    Tomasz Kajdanowicz
    Przemyslaw Kazienko
    Multimedia Tools and Applications, 2013, 65 : 63 - 74