Video Synthesis via Transform-Based Tensor Neural Network

被引：7

作者：

Zhang, Yimeng ^{[1
,2
]}

Liu, Xiao-Yang ^{[1
,2
]}

Wu, Bo ^{[3
]}

Walid, Anwar ^{[4
]}

机构：

[1] Tensor & Deep Learning Lab, New York, NY USA

[2] Columbia Univ, New York, NY USA

[3] MIT IBM Watson AI Lab, Cambridge, MA 02142 USA

[4] Nokia Bell Labs, Murray Hill, NJ USA

来源：

MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA | 2020年

关键词：

Video synthesis; transform-based tensor; tensor neural network; interpolation and prediction; deep unfolding; STABILIZATION; SHRINKAGE;

D O I：

10.1145/3394171.3413527

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video frame synthesis is an important task in computer vision and has drawn great interests in wide applications. However, existing neural network methods do not explicitly impose tensor low-rankness of videos to capture the spatiotemporal correlations in a high-dimensional space, while existing iterative algorithms require hand-crafted parameters and take relatively long running time. In this paper, we propose a novel multi-phase deep neural network Transform-Based Tensor-Net that exploits the low-rank structure of video data in a learned transform domain, which unfolds an Iterative Shrinkage-Thresholding Algorithm (ISTA) for tensor signal recovery. Our design is based on two observations: (i) both linear and nonlinear transforms can be implemented by a neural network layer, and (ii) the soft-thresholding operator corresponds to an activation function. Further, such an unfolding design is able to achieve nearly real-time at the cost of training time and enjoys an interpretable nature as a byproduct. Experimental results on the KTH and UCF-101 datasets show that compared with the state-of-the-art methods, i.e., DVF and Super SloMo, the proposed scheme improves Peak Signal-to-Noise Ratio (PSNR) of video interpolation and prediction by 4.13 dB and 4.26 dB, respectively.

引用

页码：2454 / 2462

页数：9

共 50 条

[41] Space-Time Network Codes Utilizing Transform-Based Coding
Lai, Hung-Quoc
Gao, Zhenzhen
Liu, K. J. Ray
[J]. 2010 IEEE GLOBAL TELECOMMUNICATIONS CONFERENCE GLOBECOM 2010, 2010,
[42] Improving the Performance of Anomaly Detector based on Geometric Transform-based Deep Neural Networks
Kim, Hyun-Soo
Kang, Dong-Joong
[J]. 2021 21ST INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2021), 2021, : 2188 - 2190
[43] Discrete wavelet transform-based spatial-temporal approach for quantized video watermarking
Faragallah, Osama S.
[J]. OPTICAL ENGINEERING, 2011, 50 (07)
[44] Sub-band discrete cosine transform-based greyscale image watermarking using general regression neural network
Mehta, Rajesh
Rajpal, Navin
Vishwakarma, Virendra P.
[J]. INTERNATIONAL JOURNAL OF SIGNAL AND IMAGING SYSTEMS ENGINEERING, 2015, 8 (06) : 380 - 389
[45] Knowledge Reasoning Based on Neural Tensor Network
Huang, Jian-Hui
Huang, Jiu-Ming
Li, Ai-Ping
Tong, Yong-Zhi
[J]. 4TH ANNUAL INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND APPLICATIONS (ITA 2017), 2017, 12
[46] ON THE STABILITY OF TRANSFORM-BASED CIRCULAR DECONVOLUTION
LINZER, E
[J]. SIAM JOURNAL ON NUMERICAL ANALYSIS, 1992, 29 (05) : 1482 - 1492
[47] Moving Object Recognition from Video Sequence Images Based on Wavelet Transform and Neural Network
Zhang, Kun
Wang, Cuirong
[J]. THIRD INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2011), 2011, 8009
[48] Wavelet transform-based network traffic prediction: A fast on-line approach
Zhao, Hong
Ansari, Nirwan
[J]. Journal of Computing and Information Technology, 2012, 20 (01) : 15 - 25
[49] Empirical wavelet transform-based fog removal via dark channel prior
Sarkar, Manas
Sarkar, Priyanka Rakshit
Mondal, Ujjwal
Nandi, Debashis
[J]. IET IMAGE PROCESSING, 2020, 14 (06) : 1170 - 1179
[50] An Improved Discrete Fourier Transform-Based Algorithm for Electric Network Frequency Extraction
Fu, Ling
Markham, Penn N.
Conners, Richard W.
Liu, Yilu
[J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2013, 8 (07) : 1173 - 1181

← 1 2 3 4 5 →