Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transformers

被引：0

作者：

Yoo, Jaehoon ^{[1
]}

Kim, Semin ^{[1
]}

Lee, Doyup ^{[2
]}

Kim, Chiheon ^{[2
]}

Hong, Seunghoon ^{[1
]}

机构：

[1] Korea Adv Inst Sci & Technol, Daejeon, South Korea

[2] Kakao Brain, Seongnam, South Korea

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

基金：

新加坡国家研究基金会;

关键词：

D O I：

10.1109/CVPR52729.2023.02192

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Autoregressive transformers have shown remarkable success in video generation. However, the transformers are prohibited from directly learning the long-term dependency in videos due to the quadratic complexity of self-attention, and inherently suffering from slow inference time and error propagation due to the autoregressive process. In this paper, we propose Memory-efficient Bidirectional Transformer (MeBT) for end-to-end learning of long-term dependency in videos and fast inference. Based on recent advances in bidirectional transformers, our method learns to decode the entire spatio-temporal volume of a video in parallel from partially observed patches. The proposed transformer achieves a linear time complexity in both encoding and decoding, by projecting observable context tokens into a fixed number of latent tokens and conditioning them to decode the masked tokens through the cross-attention. Empowered by linear complexity and bidirectional modeling, our method demonstrates significant improvement over the autoregressive transformers for generating moderately long videos in both quality and speed. Videos and code are available at https://sites.google.com/view/mebt-cvpr2023.

引用

页码：22888 / 22897

页数：10

共 38 条

[1] End-to-end memory-efficient reconstruction for cone beam CT
Moriakov, Nikita
Sonke, Jan-Jakob
Teuwen, Jonas
MEDICAL PHYSICS, 2023, 50 (12) : 7579 - 7593
[2] Towards End-to-End Image Compression and Analysis with Transformers
Bai, Yuanchao
Yang, Xu
Liu, Xianming
Jiang, Junjun
Wang, Yaowei
Ji, Xiangyang
Gao, Wen
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 104 - 112
[3] LithoGAN: End-to-End Lithography Modeling with Generative Adversarial Networks
Ye, Wei
Alawieh, Mohamed Baker
Lin, Yibo
Pan, David Z.
PROCEEDINGS OF THE 2019 56TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2019,
[4] Memory-efficient Temporal Moment Localization in Long Videos
Rodriguez-Opazo, Cristian
Marrese-Taylor, Edison
Fernando, Basura
Takamura, Hiroya
Wu, Qi
17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 1909 - 1924
[5] Towards End-to-End Learning for Efficient Dialogue Agent by Modeling Looking-ahead Ability
Jiang, Zhuoxuan
Mao, Xian-Ling
Huang, Ziming
Ma, Jie
Li, Shaochun
20TH ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2019), 2019, : 133 - 142
[6] Memory-Efficient Continual Learning Object Segmentation for Long Videos
Nazemi, Amir
Shafiee, Mohammad Javad
Gharaee, Zahra
Fieguth, Paul
IEEE ACCESS, 2024, 12 : 97067 - 97084
[7] An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
Fu, Tsu-Jui
Li, Linjie
Gan, Zhe
Lin, Kevin
Wang, William Yang
Wang, Lijuan
Liu, Zicheng
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22898 - 22909
[8] Epileptic Seizure Detection with an End-to-End Temporal Convolutional Network and Bidirectional Long Short-Term Memory Model
Dong, Xingchen
Wen, Yiming
Ji, Dezan
Yuan, Shasha
Liu, Zhen
Shang, Wei
Zhou, Weidong
INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2024, 34 (03)
[9] Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos
Pan, Yulin
He, Xiangteng
Gong, Biao
Lv, Yiliang
Shen, Yujun
Peng, Yuxin
Zhao, Deli
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13721 - 13731
[10] An efficient flow control plan for end-to-end delivery of pre-stored compressed videos
Tong, SR
Lee, SC
IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, PROCEEDINGS VOL 2, 1999, : 622 - 627

← 1 2 3 4 →