Video Synthesis via Transform-Based Tensor Neural Network

被引：7

作者：

Zhang, Yimeng ^{[1
,2
]}

Liu, Xiao-Yang ^{[1
,2
]}

Wu, Bo ^{[3
]}

Walid, Anwar ^{[4
]}

机构：

[1] Tensor & Deep Learning Lab, New York, NY USA

[2] Columbia Univ, New York, NY USA

[3] MIT IBM Watson AI Lab, Cambridge, MA 02142 USA

[4] Nokia Bell Labs, Murray Hill, NJ USA

来源：

MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA | 2020年

关键词：

Video synthesis; transform-based tensor; tensor neural network; interpolation and prediction; deep unfolding; STABILIZATION; SHRINKAGE;

D O I：

10.1145/3394171.3413527

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video frame synthesis is an important task in computer vision and has drawn great interests in wide applications. However, existing neural network methods do not explicitly impose tensor low-rankness of videos to capture the spatiotemporal correlations in a high-dimensional space, while existing iterative algorithms require hand-crafted parameters and take relatively long running time. In this paper, we propose a novel multi-phase deep neural network Transform-Based Tensor-Net that exploits the low-rank structure of video data in a learned transform domain, which unfolds an Iterative Shrinkage-Thresholding Algorithm (ISTA) for tensor signal recovery. Our design is based on two observations: (i) both linear and nonlinear transforms can be implemented by a neural network layer, and (ii) the soft-thresholding operator corresponds to an activation function. Further, such an unfolding design is able to achieve nearly real-time at the cost of training time and enjoys an interpretable nature as a byproduct. Experimental results on the KTH and UCF-101 datasets show that compared with the state-of-the-art methods, i.e., DVF and Super SloMo, the proposed scheme improves Peak Signal-to-Noise Ratio (PSNR) of video interpolation and prediction by 4.13 dB and 4.26 dB, respectively.

引用

页码：2454 / 2462

页数：9

共 50 条

[1] Tensor transform-based quaternion fourier transform algorithm
Grigoryan, Artyom M.
Agaian, Sos S.
[J]. INFORMATION SCIENCES, 2015, 320 : 62 - 74
[2] A Wavelet Transform-Based Neural Network Denoising Algorithm for Mobile Phonocardiography
Gradolewski, Dawid
Magenes, Giovanni
Johansson, Sven
Kulesza, Wlodek J.
[J]. SENSORS, 2019, 19 (04)
[3] Chebyshev Transform-Based Robust Trajectory Prediction Using Recurrent Neural Network
Kwag, Sujin
Kang, Byeongju
Kim, Wonhee
Hwang, Yunhyoung
[J]. IEEE ACCESS, 2022, 10 : 130397 - 130405
[4] Adaptive lapped transform-based image and video coding
Klausutis, TJ
Madisetti, VK
[J]. VISUAL COMMUNICATIONS AND IMAGE PROCESSING '97, PTS 1-2, 1997, 3024 : 117 - 128
[5] Performance Measurement for a Wavelet Transform-based Video Compression
Dhungel, Abinashi
Weeks, Michael
[J]. PROCEEDINGS OF THE 49TH ANNUAL ASSOCIATION FOR COMPUTING MACHINERY SOUTHEAST CONFERENCE (ACMSE '11), 2011, : 216 - 220
[6] Hadamard Transform-Based Optimized HEVC Video Coding
Tang, Minhao
Chen, Xinyao
Wen, Jiangtao
Han, Yuxing
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (03) : 827 - 839
[7] A comparison of neural network and fast Fourier transform-based approach for the state analysis of brain
Emoto, T
Akutagawa, M
Abeyratne, UR
Nagashino, H
Kinouchi, Y
[J]. PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS AND BRAIN, VOLS 1-3, 2005, : 94 - 99
[8] Deep demosaicking convolution neural network and quantum wavelet transform-based image denoising
Chinnaiyan, Anitha Mary
Alfred Sylam, Boyed Wesley
[J]. NETWORK-COMPUTATION IN NEURAL SYSTEMS, 2024,
[9] Wavelet packet transform-based robust video watermarking technique
Bhatnagar, Gaurav
Raman, Balasubrmanian
[J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2012, 37 (03): : 371 - 388
[10] Fractional Fourier Transform-Based Tensor RX for Hyperspectral Anomaly Detection
Zhang, Lili
Ma, Jiachen
Cheng, Baozhi
Lin, Fang
[J]. REMOTE SENSING, 2022, 14 (03)

← 1 2 3 4 5 →