Object Detection-Based Video Retargeting With Spatial-Temporal Consistency

被引:18
|
作者
Lee, Seung Joon [1 ]
Lee, Siyeong [2 ]
Cho, Sung In [3 ]
Kang, Suk-Ju [1 ]
机构
[1] Sogang Univ, Dept Elect Engn, Seoul 04107, South Korea
[2] NAVER LABS, Seongnam Si 13638, South Korea
[3] Dongguk Univ, Dept Multimedia Engn, Seoul 04620, South Korea
关键词
Object detection; Object tracking; Distortion; Indexes; Computational complexity; Image sequences; Optimization; object tracking; video retargeting; convolutional neural network;
D O I
10.1109/TCSVT.2020.2981652
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This study proposes a video retargeting method using deep neural network-based object detection. First, the meaningful regions of the input video denoted by bounding boxes of the object detection are extracted. In this case, the area is defined considering the size and number of bounding boxes for objects detected. The bounding boxes of each frame image are considered as regions of interest (RoIs). Second, the Siamese object tracking network is used to address high computational complexity of the object detection network. By dividing the video into scenes, object detection is performed for the first frame image of each scene to obtain the first bounding box. Object tracking is performed for the next sequential frame image until a scene change is detected. Third, the image is resized in the horizontal direction to alter the aspect ratio of the image and obtain the 1D RoIs of the image by projecting bounding boxes in the vertical direction. Then, the proposed method computes the grid map from the 1D RoIs to calculate new coordinates of each column data of the image. Finally, the retargeted video is obtained by rearranging all retargeted frame images. Comparative experiments conducted with various benchmark methods show an average bidirectional similarity score of 1.92, which is higher than other conventional methods. The proposed method was stable and satisfied viewers without causing cognitive discomfort as conventional methods.
引用
收藏
页码:4434 / 4439
页数:6
相关论文
共 50 条
  • [31] ACCLVOS: Atrous Convolution with Spatial-Temporal ConvLSTM for Video Object Segmentation
    Xu, Muzhou
    Zhong, Shan
    Liu, Chunping
    Gong, Shengrong
    Wang, Zhaohui
    Xia, Yu
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2089 - 2096
  • [32] Moving object detection in combination of CenSurE and spatial-temporal information
    [J]. Zhang, H.-Y. (carole_zhang0716@163.com), 1600, Chinese Academy of Sciences (21):
  • [33] Self-supervised spatial-temporal feature enhancement for one-shot video object detection
    Yao, Xudong
    Yang, Xiaoshan
    [J]. NEUROCOMPUTING, 2024, 601
  • [34] Spatial-temporal segmentation scheme for object-oriented video coding based on wavelet and MMRF
    Zheng, L
    Chan, AK
    Liu, JC
    [J]. WAVELET APPLICATIONS IN SIGNAL AND IMAGE PROCESSING VII, 1999, 3813 : 822 - 831
  • [35] Video Captioning Based on the Spatial-Temporal Saliency Tracing
    Zhou, Yuanen
    Hu, Zhenzhen
    Liu, Xueliang
    Wang, Meng
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT I, 2018, 11164 : 59 - 70
  • [36] Video Quality Assessment Based on Spatial-temporal Distortion
    Yang, Chunting
    Liu, Yang
    Yu, Jing
    [J]. PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON EDUCATION TECHNOLOGY AND COMPUTER SCIENCE, VOL I, 2009, : 818 - +
  • [37] Contrast Based Hierarchical Spatial-Temporal Saliency for Video
    Le, Trung-Nghia
    Sugimoto, Akihiro
    [J]. IMAGE AND VIDEO TECHNOLOGY, PSIVT 2015, 2016, 9431 : 734 - 748
  • [38] ISTVT: Interpretable Spatial-Temporal Video Transformer for Deepfake Detection
    Zhao, Cairong
    Wang, Chutian
    Hu, Guosheng
    Chen, Haonan
    Liu, Chun
    Tang, Jinhui
    [J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023, 18 : 1335 - 1348
  • [39] Spatial-Temporal Structural and Dynamics Features for Video Fire Detection
    Wang, Hongcheng
    Finn, Alan
    Erdinc, Ozgur
    Vincitore, Antonio
    [J]. 2013 IEEE WORKSHOP ON APPLICATIONS OF COMPUTER VISION (WACV), 2013, : 513 - 519
  • [40] A video segmentation algorithm based on spatial-temporal information
    Zhu, H
    Li, ZM
    [J]. 2002 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS AND WEST SINO EXPOSITION PROCEEDINGS, VOLS 1-4, 2002, : 566 - 569