Sparse Representation With Spatio-Temporal Online Dictionary Learning for Promising Video Coding

被引:11
|
作者
Dai, Wenrui [1 ,2 ]
Shen, Yangmei [2 ]
Tang, Xin [2 ]
Zou, Junni [3 ]
Xiong, Hongkai [2 ]
Chen, Chang Wen [4 ]
机构
[1] Univ Calif San Diego, Dept Biomed Informat, La Jolla, CA 92093 USA
[2] Shanghai Jiao Tong Univ, Dept Elect Engn, Shanghai 200240, Peoples R China
[3] Shanghai Univ, Key Lab Special Fiber Opt & Opt Access Network, Shanghai 200072, Peoples R China
[4] SUNY Buffalo, Dept Comp Sci & Engn, Buffalo, NY 14260 USA
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Online dictionary learning; sparse representation; video coding; stochastic gradient descent; K-SVD; IMAGE QUALITY ASSESSMENT; K-SVD; SUPERRESOLUTION; PREDICTION; ALGORITHM;
D O I
10.1109/TIP.2016.2594490
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classical dictionary learning methods for video coding suffer from high computational complexity and interfered coding efficiency by disregarding its underlying distribution. This paper proposes a spatio-temporal online dictionary learning (STOL) algorithm to speed up the convergence rate of dictionary learning with a guarantee of approximation error. The proposed algorithm incorporates stochastic gradient descents to form a dictionary of pairs of 3D low-frequency and high-frequency spatio-temporal volumes. In each iteration of the learning process, it randomly selects one sample volume and updates the atoms of dictionary by minimizing the expected cost, rather than optimizes empirical cost over the complete training data, such as batch learning methods, e.g., K-SVD. Since the selected volumes are supposed to be independent identically distributed samples from the underlying distribution, decomposition coefficients attained from the trained dictionary are desirable for sparse representation. Theoretically, it is proved that the proposed STOL could achieve better approximation for sparse representation than K-SVD and maintain both structured sparsity and hierarchical sparsity. It is shown to outperform batch gradient descent methods (K-SVD) in the sense of convergence speed and computational complexity, and its upper bound for prediction error is asymptotically equal to the training error. With lower computational complexity, extensive experiments validate that the STOL-based coding scheme achieves performance improvements than H.264/AVC or High Efficiency Video Coding as well as existing super-resolution-based methods in rate-distortion performance and visual quality.
引用
收藏
页码:4580 / 4595
页数:16
相关论文
共 50 条
  • [1] Sparse Spatio-Temporal Representation With Adaptive Regularized Dictionary Learning for Low Bit-Rate Video Coding
    Xiong, Hongkai
    Pan, Zhiming
    Ye, Xinwei
    Chen, Chang Wen
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2013, 23 (04) : 710 - 728
  • [2] STOL: Spatio-Temporal Online Dictionary Learning for low bit-rate video coding
    Tang, Xin
    Xiong, Hongkai
    [J]. 2013 DATA COMPRESSION CONFERENCE (DCC), 2013, : 522 - 522
  • [3] Sparse coding and dictionary learning for spike trains to find spatio-temporal patterns
    Taro Tezuka
    [J]. BMC Neuroscience, 16 (Suppl 1)
  • [4] Spatio-Temporal Crop Aggregation for Video Representation Learning
    Sameni, Sepehr
    Jenni, Simon
    Favaro, Paolo
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5641 - 5651
  • [5] Video representation learning by identifying spatio-temporal transformations
    Sheng Geng
    Shimin Zhao
    Hu Liu
    [J]. Applied Intelligence, 2022, 52 : 6613 - 6622
  • [6] Video representation learning by identifying spatio-temporal transformations
    Geng, Sheng
    Zhao, Shimin
    Liu, Hu
    [J]. APPLIED INTELLIGENCE, 2022, 52 (06) : 6613 - 6622
  • [7] Sparse Spatio-Temporal Representation with Adaptive Regularized Dictionaries for Super-Resolution Based Video Coding
    Pan, Zhiming
    Xiong, Hongkai
    [J]. 2012 DATA COMPRESSION CONFERENCE (DCC), 2012, : 139 - 148
  • [8] Spatio-temporal Video Representation with Locality-Constrained Linear Coding
    Al Ghamdi, Manal
    Al Harbi, Nouf
    Gotoh, Yoshihiko
    [J]. COMPUTER VISION - ECCV 2012, PT III, 2012, 7585 : 101 - 110
  • [9] Learning Spatio-temporal Representation by Channel Aliasing Video Perception
    Lin, Yiqi
    Wang, Jinpeng
    Zhang, Manlin
    Ma, Andy J.
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2317 - 2325
  • [10] Video coding with spatio-temporal texture synthesis
    Zhu, Chunbo
    Sun, Xiaoyan
    Wu, Feng
    Li, Houqiang
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-5, 2007, : 112 - +