Improving AI-assisted video editing: Optimized footage analysis through multi-task learning

被引：0

作者：

Li, Yuzhi ^{[1
]}

Xu, Haojun ^{[1
]}

Cai, Feifan ^{[1
]}

Tian, Feng ^{[1
]}

机构：

[1] Shanghai Univ, Shanghai, Peoples R China

来源：

NEUROCOMPUTING | 2024年 / 609卷

关键词：

Footage analysis; Multi-task learning; AI-assisted video editing;

D O I：

10.1016/j.neucom.2024.128485

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, AI-assisted video editing has shown promising applications. Understanding and analyzing camera language accurately is fundamental in video editing, guiding subsequent editing and production processes. However, many existing methods for camera language analysis overlook computational efficiency and deployment requirements in favor of improving classification accuracy. Consequently, they often fail to meet the demands of scenarios with limited computing power, such as mobile devices. To address this challenge, this paper proposes an efficient multi-task camera language analysis pipeline based on shared representations. This approach employs a multi-task learning architecture with hard parameter sharing, enabling different camera language classification tasks to utilize the same low-level feature extraction network, thereby implicitly learning feature representations of the footage. Subsequently, each classification sub- task independently learns the high-level semantic information corresponding to the camera language type. This method significantly reduces computational complexity and memory usage while facilitating efficient deployment on devices with limited computing power. Furthermore, to enhance performance, we introduce a dynamic task priority strategy and a conditional dataset downsampling strategy. The experimental results demonstrate that achieved a comprehensive accuracy surpassing all previous methods. Moreover, training time was reduced by 66.33%, inference cost decreased by 59.85%, and memory usage decreased by 31.95% on the 2-task dataset MovieShots; on the 4-task dataset AVE, training time was reduced by 95.34%, inference cost decreased by 97.23%, and memory usage decreased by 61.21%.

引用

页数：10

共 50 条

[1] The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assisted Video Editing
Argaw, Dawit Mureja
Heilbron, Fabian Caba
Lee, Joon-Young
Woodson, Markus
Kweon, In So
COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 201 - 218
[2] Improving Steering and Verification in AI-Assisted Data Analysis with Interactive Task Decomposition
Kazemitabaar, Majeed
Williams, Jack
Drosos, Ian
Grossman, Tovi
Henley, Austin Z.
Negreanu, Carina
Sarkar, Advait
PROCEEDINGS OF THE 37TH ANNUAL ACM SYMPOSIUM ON USER INTERFACE SOFTWARE AND TECHNOLOGY, USIT 2024, 2024,
[3] Improving sentiment analysis with multi-task learning of negation
Barnes, Jeremy
Velldal, Erik
Ovrelid, Lilja
NATURAL LANGUAGE ENGINEERING, 2021, 27 (02) : 249 - 269
[4] Improving Machine Translation of Arabic Dialects Through Multi-task Learning
Moukafih, Youness
Sbihi, Nada
Ghogho, Mounir
Smaili, Kamel
AIXIA 2021 - ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13196 : 580 - 590
[5] Multi-task learning for video anomaly detection*
Chang, Xingya
Zhang, Yuxin
Xue, Dingyu
Chen, Dongyue
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2022, 87
[6] Multi-task learning for video anomaly detection
Chang, Xingya
Zhang, Yuxin
Xue, Dingyu
Chen, Dongyue
Journal of Visual Communication and Image Representation, 2022, 87
[7] Phenotype Analysis of Arabidopsis thaliana Based on Optimized Multi-Task Learning
Yuan, Peisen
Xu, Shuning
Zhai, Zhaoyu
Xu, Huanliang
MATHEMATICS, 2023, 11 (18)
[8] SEMAX: Multi-Task Learning for Improving Recommendations
Zhang, Jia-Dong
Chow, Chi-Yin
IEEE ACCESS, 2019, 7 : 2305 - 2314
[9] Improving Few-Shot Learning Through Multi-task Representation Learning Theory
Bouniot, Quentin
Redko, Ievgen
Audigier, Romaric
Loesch, Angelique
Habrard, Amaury
COMPUTER VISION, ECCV 2022, PT XX, 2022, 13680 : 435 - 452
[10] Comic MTL: optimized multi-task learning for comic book image analysis
Nhu-Van Nguyen
Rigaud, Christophe
Burie, Jean-Christophe
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2019, 22 (03) : 265 - 284

← 1 2 3 4 5 →