Unifying Event Detection and Captioning as Sequence Generation via Pre-training

被引:6
|
作者
Zhang, Qi [1 ]
Song, Yuqing [1 ]
Jin, Qin [1 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China
来源
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Dense video captioning; Pre-training; Sequence generation;
D O I
10.1007/978-3-031-20059-5_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dense video captioning aims to generate corresponding text descriptions for a series of events in the untrimmed video, which can be divided into two sub-tasks, event detection and event captioning. Unlike previous works that tackle the two sub-tasks separately, recent works have focused on enhancing the inter-task association between the two sub-tasks. However, designing inter-task interactions for event detection and captioning is not trivial due to the large differences in their task specific solutions. Besides, previous event detection methods normally ignore temporal dependencies between events, leading to event redundancy or inconsistency problems. To tackle above the two defects, in this paper, we define event detection as a sequence generation task and propose a unified pre-training and fine-tuning framework to naturally enhance the inter-task association between event detection and captioning. Since the model predicts each event with previous events as context, the inter-dependency between events is fully exploited and thus our model can detect more diverse and consistent events in the video. Experiments on the ActivityNet dataset show that our model outperforms the state-of-the-art methods, and can be further boosted when pre-trained on extra large-scale video-text data. Code is available at https://github.com/QiQAng/UEDVC.
引用
收藏
页码:363 / 379
页数:17
相关论文
共 50 条
  • [31] TWO-STAGE PRE-TRAINING FOR SEQUENCE TO SEQUENCE SPEECH RECOGNITION
    Fan, Zhiyun
    Zhou, Shiyu
    Xu, Bo
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [32] Improved OOD Generalization via Adversarial Training and Pre-training
    Yi, Mingyangi
    Hou, Lu
    Sun, Jiacheng
    Shang, Lifeng
    Jiang, Xin
    Liu, Qun
    Ma, Zhi-Ming
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [33] ClarET: Pre-training a Correlation-Aware Context-To-Event Transformer for Event-Centric Generation and Classification
    Zhou, Yucheng
    Shen, Tao
    Geng, Xiubo
    Long, Guodong
    Jiang, Daxin
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 2559 - 2575
  • [34] Pre-Training Protein Encoder via Siamese Sequence-Structure Diffusion Trajectory Prediction
    Zhang, Zuobai
    Xu, Minghao
    Lozano, Aurelie
    Chenthamarakshan, Vijil
    Das, Payel
    Tang, Jian
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [35] Label-efficient object detection via region proposal network pre-training
    Dong, Nanqing
    Ericsson, Linus
    Yang, Yongxin
    Leonardis, Ales
    Mcdonagh, Steven
    [J]. NEUROCOMPUTING, 2024, 577
  • [36] Insights into Pre-training via Simpler Synthetic Tasks
    Wu, Yuhuai
    Li, Felix
    Liang, Percy
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [37] Learning Visual Prior via Generative Pre-Training
    Xie, Jinheng
    Ye, Kai
    Li, Yudong
    Li, Yuexiang
    Lin, Kevin Qinghong
    Zheng, Yefeng
    Shen, Linlin
    Shou, Mike Zheng
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [38] Malbert: A novel pre-training method for malware detection
    Xu, Zhifeng
    Fang, Xianjin
    Yang, Gaoming
    [J]. COMPUTERS & SECURITY, 2021, 111
  • [39] User Behavior Pre-training for Online Fraud Detection
    Liu, Can
    Gao, Yuncong
    Sun, Li
    Feng, Jinghua
    Yang, Hao
    Ao, Xiang
    [J]. PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 3357 - 3365
  • [40] POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training
    Zhang, Yizhe
    Wang, Guoyin
    Li, Chunyuan
    Gan, Zhe
    Brockett, Chris
    Dolan, Bill
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 8649 - 8670