Convolutional State Space Models for Long-Range Spatiotemporal Modeling

被引：0

作者：

Smith, Jimmy T. H. ^{[2
,4
]}

De Mello, Shalini ^{[1
]}

Kautz, Jan ^{[1
]}

Linderman, Scott W. ^{[3
,4
]}

Byeon, Wonmin ^{[1
]}

机构：

[1] NVIDIA, Santa Clara, CA USA

[2] Stanford Univ, Inst Computat & Math Engn, Stanford, CA 94305 USA

[3] Stanford Univ, Dept Stat, Stanford, CA 94305 USA

[4] Stanford Univ, Wu Tsai Neurosci Inst, Stanford, CA 94305 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

TIME;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Effectively modeling long spatiotemporal sequences is challenging due to the need to model complex spatial correlations and long-range temporal dependencies simultaneously. ConvLSTMs attempt to address this by updating tensor-valued states with recurrent neural networks, but their sequential computation makes them slow to train. In contrast, Transformers can process an entire spatiotemporal sequence, compressed into tokens, in parallel. However, the cost of attention scales quadratically in length, limiting their scalability to longer sequences. Here, we address the challenges of prior methods and introduce convolutional state space models (ConvSSM)(1) that combine the tensor modeling ideas of ConvLSTM with the long sequence modeling approaches of state space methods such as S4 and S5. First, we demonstrate how parallel scans can be applied to convolutional recurrences to achieve subquadratic parallelization and fast autoregressive generation. We then establish an equivalence between the dynamics of ConvSSMs and SSMs, which motivates parameterization and initialization strategies for modeling long-range dependencies. The result is ConvS5, an efficient ConvSSM variant for long-range spatiotemporal modeling. ConvS5 significantly outperforms Transformers and ConvLSTM on a long horizon Moving-MNIST experiment while training 3x faster than ConvLSTM and generating samples 400x faster than Transformers. In addition, ConvS5 matches or exceeds the performance of state-of-the-art methods on challenging DMLab, Minecraft and Habitat prediction benchmarks and enables new directions for modeling long spatiotemporal sequences.

引用

页数：40

共 50 条

[21] MODELING OF THE LONG-RANGE TRANSPORT OF PEROXYACETYLNITRATE TO SCANDINAVIA
HOV, O
JOURNAL OF ATMOSPHERIC CHEMISTRY, 1984, 1 (02) : 187 - 202
[22] Long-range terrain perception using convolutional neural networks
Zhang, Wei
Chen, Qi
Zhang, Weidong
He, Xuanyu
NEUROCOMPUTING, 2018, 275 : 781 - 787
[23] Capturing long-range correlations with patch models
Cheung, Vincent
Jojic, Nebojsa
Samaras, Dimitris
2007 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-8, 2007, : 974 - +
[24] Long-range Trap Models on and Quasistable Processes
Barreto-Souza, W.
Fontes, L. R. G.
JOURNAL OF THEORETICAL PROBABILITY, 2015, 28 (04) : 1500 - 1519
[25] Long-range dependent traffic models CATV
Volner, R
Bores, P
Tichá, D
BEC 2002: PROCEEDINGS OF THE 8TH BIENNIAL BALTIC ELECTRONIC CONFERENCE, 2002, : 347 - 350
[26] Lifting integrable models and long-range interactions
de Leeuw, Marius
Retore, Ana Lucia
SCIPOST PHYSICS, 2023, 15 (06):
[27] Long-range vector models at large N
Noam Chai
Mikhail Goykhman
Ritam Sinha
Journal of High Energy Physics, 2021
[28] Long-range vector models at large N
Chai, Noam
Goykhman, Mikhail
Sinha, Ritam
JOURNAL OF HIGH ENERGY PHYSICS, 2021, 2021 (09)
[29] LONG-RANGE ORDER FOR ANTIFERROMAGNETIC POTTS MODELS
KOTECKY, R
PHYSICAL REVIEW B, 1985, 31 (05) : 3088 - 3092
[30] ROLE OF MODELS IN CORPORATE LONG-RANGE PLANNING
KELSEY, PJ
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1968, 63 (322) : 755 - 755

← 1 2 3 4 5 →