Satellite Video Multi-Label Scene Classification With Spatial and Temporal Feature Cooperative Encoding: A Benchmark Dataset and Method

被引：0

作者：

Guo, Weilong ^{[1
]}

Li, Shengyang ^{[2
]}

Chen, Feixiang ^{[3
]}

Sun, Yuhan ^{[2
]}

Gu, Yanfeng ^{[4
]}

机构：

[1] Chinese Acad Sci, Key Lab Space Utilizat & Technol, Beijing 100094, Peoples R China

[2] Univ Chinese Acad Sci, Sch Aeronaut & Astronaut, Beijing 100049, Peoples R China

[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China

[4] Harbin Inst Technol, Sch Elect & Informat Engn, Harbin 150001, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2024年 / 33卷

关键词：

Satellite video; multi-label scene classification; spatial and temporal; feature encoding;

D O I：

10.1109/TIP.2024.3374100

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Satellite video multi-label scene classification predicts semantic labels of multiple ground contents to describe a given satellite observation video, which plays an important role in applications like ocean observation, smart cities, et al. However, the lack of a high-quality and large-scale dataset prevents further improvement of the task. And existing methods on general videos have the difficulty to represent the local details of ground contents when directly applied to the satellite videos. In this paper, our contributions include (1) we develop the first publicly available and large-scale satellite video multi-label scene classification dataset. It consists of 18 classes of static and dynamic ground contents, 3549 videos, and 141960 frames. (2) we propose a baseline method with the novel Spatial and Temporal Feature Cooperative Encoding (STFCE). It exploits the relations between local spatial and temporal features, and models long-term motion information hidden in inter-frame variations. In this way, it can enhance features of local details and obtain the powerful video-scene-level feature representation, which raises the classification performance effectively. Experimental results show that our proposed STFCE outperforms 13 state-of-the-art methods with a global average precision (GAP) of 0.8106 and the careful fusion and joint learning of the spatial, temporal, and motion features are beneficial to achieve a more robust and accurate model. Moreover, benchmarking results show that the proposed dataset is very challenging and we hope it could promote further development of the satellite video multi-label scene classification task.

引用

页码：2238 / 2251

页数：14

共 27 条

[1] Micro-video multi-label classification method based on multi-modal feature encoding
Jing P.
Li Y.
Su Y.
Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2022, 49 (04): : 109 - 117
[2] TRANSTL: SPATIAL-TEMPORAL LOCALIZATION TRANSFORMER FOR MULTI-LABEL VIDEO CLASSIFICATION
Wu, Hongjun
Li, Mengzhu
Liu, Yongcheng
Liu, Hongzhe
Xu, Cheng
Li, Xuewei
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1965 - 1969
[3] Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark
Haurum, Joakim Bruslund
Moeslund, Thomas B.
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 13451 - 13462
[4] A lazy feature selection method for multi-label classification
Pereira, Rafael B.
Plastino, Alexandre
Zadrozny, Bianca
Merschmann, Luiz H. C.
INTELLIGENT DATA ANALYSIS, 2021, 25 (01) : 21 - 34
[5] Multi-label Classification via Feature-aware Implicit Label Space Encoding
Lin, Zijia
Ding, Guiguang
Hu, Mingqing
Wang, Jianmin
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 325 - 333
[6] Approach for Video Classification with Multi-label on YouTube-8M Dataset
Shin, Kwangsoo
Jeon, Junhyeong
Lee, Seungbin
Lim, Boyoung
Jeong, Minsoo
Nang, Jongho
COMPUTER VISION - ECCV 2018 WORKSHOPS, PT IV, 2019, 11132 : 317 - 324
[7] MLRSNet: A multi-label high spatial resolution remote sensing dataset for semantic scene understanding
Qi, Xiaoman
Zhu, Panpan
Wang, Yuebin
Zhang, Liqiang
Peng, Junhuan
Wu, Mengfan
Chen, Jialong
Zhao, Xudong
Zang, Ning
Mathiopoulos, P. Takis
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2020, 169 (169) : 337 - 350
[8] TreeSatAI Benchmark Archive: a multi-sensor, multi-label dataset for tree species classification in remote sensing
Ahlswede, Steve
Schulz, Christian
Gava, Christiano
Helber, Patrick
Bischke, Benjamin
Foerster, Michael
Arias, Florencia
Hees, Joern
Demir, Beguem
Kleinschmit, Birgit
EARTH SYSTEM SCIENCE DATA, 2023, 15 (02) : 681 - 695
[9] Quick scene classification method based on compact encoding of video feature sequence
Nagasaka, Akio
Miyatake, Takafumi
Systems and Computers in Japan, 2000, 31 (01) : 102 - 108
[10] Relational large scale multi-label classification method for video categorization
Wojciech Indyk
Tomasz Kajdanowicz
Przemyslaw Kazienko
Multimedia Tools and Applications, 2013, 65 : 63 - 74

← 1 2 3 →