FastClip: An Efficient Video Understanding System with Heterogeneous Computing and Coarse-to-fine Processing

被引:0
|
作者
Zhao, Liming [1 ]
Sun, Siyang [1 ]
Zhang, Yanhao [1 ]
Zheng, Yun [1 ]
Pan, Pan [1 ]
机构
[1] Alibaba Grp, Hangzhou, Peoples R China
关键词
video understanding; heterogeneous computing; system speedup;
D O I
10.1145/3487553.3524209
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, video medias are exponentially growing in many areas such as E-commerce shopping and gaming. Understanding the video contents is critical for real-world applications. However, processing long videos is usually time-consuming and expensive. In this paper, we present an efficient video understanding system, which aims to speed up the video processing with a coarse-to-fine two-stage pipeline and heterogeneous computing framework. First, we use a coarse but fast multi-modal filtering module to recognize and remove useless video segments from a long video, which could be deployed on an edge device and reduce computations for the next processing. Second, several semantic models are applied for finely parsing the remained sequences. To accelerate the model inference, we propose a novel heterogeneous computing framework, which trains a model with lightweight and heavyweight backbones to support a distributed deployment on a powerful device (e.g., cloud or GPU) and another different device (e.g., edge or CPU). In this way, the model could be both efficient and effective. The proposed system has been widely used in Alibaba, including "Taobao Live Analysis" and "Commodity Short-Video Generation", which could achieve a 10x speedup for the system.
引用
收藏
页码:67 / 71
页数:5
相关论文
共 50 条
  • [1] A Coarse-to-Fine Framework for Resource Efficient Video Recognition
    Zuxuan Wu
    Hengduo Li
    Yingbin Zheng
    Caiming Xiong
    Yu-Gang Jiang
    Larry S Davis
    International Journal of Computer Vision, 2021, 129 : 2965 - 2977
  • [2] A Coarse-to-Fine Framework for Resource Efficient Video Recognition
    Wu, Zuxuan
    Li, Hengduo
    Zheng, Yingbin
    Xiong, Caiming
    Jiang, Yu-Gang
    Davis, Larry S.
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (11) : 2965 - 2977
  • [3] LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition
    Wu, Zuxuan
    Xiong, Caiming
    Jiang, Yu-Gang
    Davis, Larry S.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [4] 'Coarse-to-fine' cyclopean processing
    Popple, AV
    Findlay, JM
    PERCEPTION, 1999, 28 (02) : 155 - 165
  • [5] COARSE-TO-FINE VIDEO TEXT DETECTION
    Miao, Guangyi
    Huang, Qingming
    Jiang, Shuqiang
    Gao, Wen
    2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4, 2008, : 569 - +
  • [6] A Coarse-to-Fine Framework for Automatic Video Unscreen
    Rao, Anyi
    Xu, Linning
    Li, Zhizhong
    Huang, Qingqiu
    Kuang, Zhanghui
    Zhang, Wayne
    Lin, Dahua
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2723 - 2733
  • [7] CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding
    Hou, Zhijian
    Zhong, Wanjun
    Ji, Lei
    Gao, Difei
    Yan, Kun
    Chan, Wing-Kwong
    Ngo, Chong-Wah
    Shou, Mike Zheng
    Duan, Nan
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 8013 - 8028
  • [8] FROM VIDEO TO TEXT: SEMANTIC DRIVING SCENE UNDERSTANDING USING A COARSE-TO-FINE METHOD
    Fu, Huiyuan
    Ma, Huadong
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 1393 - 1396
  • [9] A COARSE-TO-FINE LOGO RECOGNITION METHOD IN VIDEO STREAMS
    Zhao, Chaoyang
    Wang, Jinqiao
    Xie, Chengli
    Lu, Hanqing
    2014 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2014,
  • [10] COARSE-TO-FINE MOVING REGION SEGMENTATION IN COMPRESSED VIDEO
    Chen, Yue-Meng
    Bajic, Ivan V.
    Saeedi, Parvaneh
    2009 10TH INTERNATIONAL WORKSHOP ON IMAGE ANALYSIS FOR MULTIMEDIA INTERACTIVE SERVICES, 2009, : 45 - 48