Improving AI-assisted video editing: Optimized footage analysis through multi-task learning

被引:0
|
作者
Li, Yuzhi [1 ]
Xu, Haojun [1 ]
Cai, Feifan [1 ]
Tian, Feng [1 ]
机构
[1] Shanghai Univ, Shanghai, Peoples R China
关键词
Footage analysis; Multi-task learning; AI-assisted video editing;
D O I
10.1016/j.neucom.2024.128485
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, AI-assisted video editing has shown promising applications. Understanding and analyzing camera language accurately is fundamental in video editing, guiding subsequent editing and production processes. However, many existing methods for camera language analysis overlook computational efficiency and deployment requirements in favor of improving classification accuracy. Consequently, they often fail to meet the demands of scenarios with limited computing power, such as mobile devices. To address this challenge, this paper proposes an efficient multi-task camera language analysis pipeline based on shared representations. This approach employs a multi-task learning architecture with hard parameter sharing, enabling different camera language classification tasks to utilize the same low-level feature extraction network, thereby implicitly learning feature representations of the footage. Subsequently, each classification sub- task independently learns the high-level semantic information corresponding to the camera language type. This method significantly reduces computational complexity and memory usage while facilitating efficient deployment on devices with limited computing power. Furthermore, to enhance performance, we introduce a dynamic task priority strategy and a conditional dataset downsampling strategy. The experimental results demonstrate that achieved a comprehensive accuracy surpassing all previous methods. Moreover, training time was reduced by 66.33%, inference cost decreased by 59.85%, and memory usage decreased by 31.95% on the 2-task dataset MovieShots; on the 4-task dataset AVE, training time was reduced by 95.34%, inference cost decreased by 97.23%, and memory usage decreased by 61.21%.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assisted Video Editing
    Argaw, Dawit Mureja
    Heilbron, Fabian Caba
    Lee, Joon-Young
    Woodson, Markus
    Kweon, In So
    COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 201 - 218
  • [2] Improving Steering and Verification in AI-Assisted Data Analysis with Interactive Task Decomposition
    Kazemitabaar, Majeed
    Williams, Jack
    Drosos, Ian
    Grossman, Tovi
    Henley, Austin Z.
    Negreanu, Carina
    Sarkar, Advait
    PROCEEDINGS OF THE 37TH ANNUAL ACM SYMPOSIUM ON USER INTERFACE SOFTWARE AND TECHNOLOGY, USIT 2024, 2024,
  • [3] Improving sentiment analysis with multi-task learning of negation
    Barnes, Jeremy
    Velldal, Erik
    Ovrelid, Lilja
    NATURAL LANGUAGE ENGINEERING, 2021, 27 (02) : 249 - 269
  • [4] Improving Machine Translation of Arabic Dialects Through Multi-task Learning
    Moukafih, Youness
    Sbihi, Nada
    Ghogho, Mounir
    Smaili, Kamel
    AIXIA 2021 - ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13196 : 580 - 590
  • [5] Multi-task learning for video anomaly detection*
    Chang, Xingya
    Zhang, Yuxin
    Xue, Dingyu
    Chen, Dongyue
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2022, 87
  • [6] Multi-task learning for video anomaly detection
    Chang, Xingya
    Zhang, Yuxin
    Xue, Dingyu
    Chen, Dongyue
    Journal of Visual Communication and Image Representation, 2022, 87
  • [7] Phenotype Analysis of Arabidopsis thaliana Based on Optimized Multi-Task Learning
    Yuan, Peisen
    Xu, Shuning
    Zhai, Zhaoyu
    Xu, Huanliang
    MATHEMATICS, 2023, 11 (18)
  • [8] SEMAX: Multi-Task Learning for Improving Recommendations
    Zhang, Jia-Dong
    Chow, Chi-Yin
    IEEE ACCESS, 2019, 7 : 2305 - 2314
  • [9] Improving Few-Shot Learning Through Multi-task Representation Learning Theory
    Bouniot, Quentin
    Redko, Ievgen
    Audigier, Romaric
    Loesch, Angelique
    Habrard, Amaury
    COMPUTER VISION, ECCV 2022, PT XX, 2022, 13680 : 435 - 452
  • [10] Comic MTL: optimized multi-task learning for comic book image analysis
    Nhu-Van Nguyen
    Rigaud, Christophe
    Burie, Jean-Christophe
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2019, 22 (03) : 265 - 284