FastClip: An Efficient Video Understanding System with Heterogeneous Computing and Coarse-to-fine Processing

被引:0
|
作者
Zhao, Liming [1 ]
Sun, Siyang [1 ]
Zhang, Yanhao [1 ]
Zheng, Yun [1 ]
Pan, Pan [1 ]
机构
[1] Alibaba Grp, Hangzhou, Peoples R China
关键词
video understanding; heterogeneous computing; system speedup;
D O I
10.1145/3487553.3524209
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, video medias are exponentially growing in many areas such as E-commerce shopping and gaming. Understanding the video contents is critical for real-world applications. However, processing long videos is usually time-consuming and expensive. In this paper, we present an efficient video understanding system, which aims to speed up the video processing with a coarse-to-fine two-stage pipeline and heterogeneous computing framework. First, we use a coarse but fast multi-modal filtering module to recognize and remove useless video segments from a long video, which could be deployed on an edge device and reduce computations for the next processing. Second, several semantic models are applied for finely parsing the remained sequences. To accelerate the model inference, we propose a novel heterogeneous computing framework, which trains a model with lightweight and heavyweight backbones to support a distributed deployment on a powerful device (e.g., cloud or GPU) and another different device (e.g., edge or CPU). In this way, the model could be both efficient and effective. The proposed system has been widely used in Alibaba, including "Taobao Live Analysis" and "Commodity Short-Video Generation", which could achieve a 10x speedup for the system.
引用
收藏
页码:67 / 71
页数:5
相关论文
共 50 条
  • [21] Coarse-to-Fine: A hierarchical DNN inference framework for edge computing
    Zhang, Zao
    Zhang, Yuning
    Bao, Wei
    Li, Changyang
    Yuan, Dong
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 157 : 180 - 192
  • [22] EFFICIENT HUMAN ACTION DETECTION: A COARSE-TO-FINE STRATEGY
    Wu, Xian
    Lai, Jianhuang
    Chen, Xilin
    2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, : 701 - 704
  • [23] Coarse-to-fine strategy for robust and efficient change detectors
    Bevilacqua, A
    Di Stefano, L
    Lanza, A
    AVSS 2005: ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE, PROCEEDINGS, 2005, : 87 - 92
  • [24] Balanced coarse-to-fine federated learning for noisy heterogeneous clients
    Longfei Han
    Ying Zhai
    Yanan Jia
    Qiang Cai
    Haisheng Li
    Xiankai Huang
    Complex & Intelligent Systems, 2025, 11 (2)
  • [25] Efficient Monocular Coarse-to-Fine Object Pose Estimation
    Feng, Rong
    Zhang, Hong
    2016 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION, 2016, : 1617 - 1622
  • [26] Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning
    Tian, Kaibin
    Cheng, Yanhua
    Liu, Yi
    Hou, Xinglin
    Chen, Quan
    Li, Han
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5207 - 5214
  • [27] Fully Convolutional Video Captioning with Coarse-to-Fine and Inherited Attention
    Fang, Kuncheng
    Zhou, Lian
    Jin, Cheng
    Zhang, Yuejie
    Weng, Kangnian
    Zhang, Tao
    Fan, Weiguo
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8271 - 8278
  • [28] Coarse-to-fine Semantic Video Segmentation using Supervoxel Trees
    Jain, Aastha
    Chatterjee, Shaunak
    Vidal, Rene
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 1865 - 1872
  • [29] Efficient coarse-to-fine spectral rectification for hyperspectral image
    Xie, Weiying
    Li, Yunsong
    Zhou, Weiping
    Zheng, Yuxuan
    NEUROCOMPUTING, 2018, 275 : 2490 - 2504
  • [30] Augmented Coarse-to-Fine Video Frame Synthesis with Semantic Loss
    Jin, Xin
    Chen, Zhibo
    Liu, Sen
    Zhou, Wei
    PATTERN RECOGNITION AND COMPUTER VISION (PRCV 2018), PT I, 2018, 11256 : 439 - 452