Object Detection-Based Video Retargeting With Spatial-Temporal Consistency

被引：18

作者：

Lee, Seung Joon ^{[1
]}

Lee, Siyeong ^{[2
]}

Cho, Sung In ^{[3
]}

Kang, Suk-Ju ^{[1
]}

机构：

[1] Sogang Univ, Dept Elect Engn, Seoul 04107, South Korea

[2] NAVER LABS, Seongnam Si 13638, South Korea

[3] Dongguk Univ, Dept Multimedia Engn, Seoul 04620, South Korea

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2020年 / 30卷 / 12期

关键词：

Object detection; Object tracking; Distortion; Indexes; Computational complexity; Image sequences; Optimization; object tracking; video retargeting; convolutional neural network;

D O I：

10.1109/TCSVT.2020.2981652

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This study proposes a video retargeting method using deep neural network-based object detection. First, the meaningful regions of the input video denoted by bounding boxes of the object detection are extracted. In this case, the area is defined considering the size and number of bounding boxes for objects detected. The bounding boxes of each frame image are considered as regions of interest (RoIs). Second, the Siamese object tracking network is used to address high computational complexity of the object detection network. By dividing the video into scenes, object detection is performed for the first frame image of each scene to obtain the first bounding box. Object tracking is performed for the next sequential frame image until a scene change is detected. Third, the image is resized in the horizontal direction to alter the aspect ratio of the image and obtain the 1D RoIs of the image by projecting bounding boxes in the vertical direction. Then, the proposed method computes the grid map from the 1D RoIs to calculate new coordinates of each column data of the image. Finally, the retargeted video is obtained by rearranging all retargeted frame images. Comparative experiments conducted with various benchmark methods show an average bidirectional similarity score of 1.92, which is higher than other conventional methods. The proposed method was stable and satisfied viewers without causing cognitive discomfort as conventional methods.

引用

页码：4434 / 4439

页数：6

共 50 条

[31] ACCLVOS: Atrous Convolution with Spatial-Temporal ConvLSTM for Video Object Segmentation
Xu, Muzhou
Zhong, Shan
Liu, Chunping
Gong, Shengrong
Wang, Zhaohui
Xia, Yu
[J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2089 - 2096
[32] Moving object detection in combination of CenSurE and spatial-temporal information
[J]. Zhang, H.-Y. (carole_zhang0716@163.com), 1600, Chinese Academy of Sciences (21):
[33] Self-supervised spatial-temporal feature enhancement for one-shot video object detection
Yao, Xudong
Yang, Xiaoshan
[J]. NEUROCOMPUTING, 2024, 601
[34] Spatial-temporal segmentation scheme for object-oriented video coding based on wavelet and MMRF
Zheng, L
Chan, AK
Liu, JC
[J]. WAVELET APPLICATIONS IN SIGNAL AND IMAGE PROCESSING VII, 1999, 3813 : 822 - 831
[35] Video Captioning Based on the Spatial-Temporal Saliency Tracing
Zhou, Yuanen
Hu, Zhenzhen
Liu, Xueliang
Wang, Meng
[J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT I, 2018, 11164 : 59 - 70
[36] Video Quality Assessment Based on Spatial-temporal Distortion
Yang, Chunting
Liu, Yang
Yu, Jing
[J]. PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON EDUCATION TECHNOLOGY AND COMPUTER SCIENCE, VOL I, 2009, : 818 - +
[37] Contrast Based Hierarchical Spatial-Temporal Saliency for Video
Le, Trung-Nghia
Sugimoto, Akihiro
[J]. IMAGE AND VIDEO TECHNOLOGY, PSIVT 2015, 2016, 9431 : 734 - 748
[38] ISTVT: Interpretable Spatial-Temporal Video Transformer for Deepfake Detection
Zhao, Cairong
Wang, Chutian
Hu, Guosheng
Chen, Haonan
Liu, Chun
Tang, Jinhui
[J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023, 18 : 1335 - 1348
[39] Spatial-Temporal Structural and Dynamics Features for Video Fire Detection
Wang, Hongcheng
Finn, Alan
Erdinc, Ozgur
Vincitore, Antonio
[J]. 2013 IEEE WORKSHOP ON APPLICATIONS OF COMPUTER VISION (WACV), 2013, : 513 - 519
[40] A video segmentation algorithm based on spatial-temporal information
Zhu, H
Li, ZM
[J]. 2002 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS AND WEST SINO EXPOSITION PROCEEDINGS, VOLS 1-4, 2002, : 566 - 569

← 1 2 3 4 5 →