A Hierarchical Framwork with Improved Loss for Large-scale Multi-modal Video Identification

被引:0
|
作者
Zhang, Shichuan [1 ]
Tang, Zengming [1 ]
Pan, Hao [1 ]
Wei, Xinyu [1 ]
Huang, Jun [1 ]
机构
[1] Shanghai Adv Res Inst, Shanghai, Peoples R China
关键词
video identification; models combination; feature fusion; improved loss function;
D O I
10.1145/3343031.3356074
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper introduces our solution for iQIYI Celebrity Video Identification Challenge. After analyzing the iQIYI-VID-2019 dataset, we find the distribution of the dataset is very unbalanced and there are many unlabeled samples in the validation set and the test set. For these challenge, we propose a hierarchical system which combines different models and fuses base classifiers. For the false detections and low-quality features in the dataset, we use a simple and reasonable strategy to fuse features. In order to detect videos more accurately, we choose an improved loss function for the learning of base classifiers. Experiment results show that our framework performs well and evaluation conducted by the organizers shows that our final result gets the ninth place online and mAP 88.08%.
引用
收藏
页码:2539 / 2542
页数:4
相关论文
共 50 条
  • [41] Multi-modal artificial dura for simultaneous large-scale optical access and large-scale electrophysiology in non-human primate cortex
    Griggs, Devon J.
    Khateeb, Karam
    Zhou, Jasmine
    Liu, Teng
    Wang, Ruikang
    Yazdan-Shahmorad, Azadeh
    [J]. JOURNAL OF NEURAL ENGINEERING, 2021, 18 (05)
  • [42] Hierarchical Multi-Modal Prompting Transformer for Multi-Modal Long Document Classification
    Liu, Tengfei
    Hu, Yongli
    Gao, Junbin
    Sun, Yanfeng
    Yin, Baocai
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 6376 - 6390
  • [43] Large Scale Multi-Lingual Multi-Modal Summarization Dataset
    Verma, Yash
    Jangra, Anubhav
    Kumar, Raghvendra
    Saha, Sriparna
    [J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 3620 - 3632
  • [44] IBISCape: A Simulated Benchmark for multi-modal SLAM Systems Evaluation in Large-scale Dynamic Environments
    Soliman, Abanob
    Bonardi, Fabien
    Sidibe, Desire
    Bouchafa, Samia
    [J]. JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2022, 106 (03)
  • [45] IBISCape: A Simulated Benchmark for multi-modal SLAM Systems Evaluation in Large-scale Dynamic Environments
    Abanob Soliman
    Fabien Bonardi
    Désiré Sidibé
    Samia Bouchafa
    [J]. Journal of Intelligent & Robotic Systems, 2022, 106
  • [46] WenLan: Efficient Large-Scale Multi-Modal Pre-Training on Real World Data
    Song, Ruihua
    [J]. MMPT '21: PROCEEDINGS OF THE 2021 WORKSHOP ON MULTI-MODAL PRE-TRAINING FOR MULTIMEDIA UNDERSTANDING, 2021, : 3 - 3
  • [47] Practical Membership Inference Attacks Against Large-Scale Multi-Modal Models: A Pilot Study
    Ko, Myeongseob
    Jin, Ming
    Wang, Chenguang
    Jia, Ruoxi
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 4848 - 4858
  • [48] Semantic-Driven Interpretable Deep Multi-Modal Hashing for Large-Scale Multimedia Retrieval
    Lu, Xu
    Liu, Li
    Nie, Liqiang
    Chang, Xiaojun
    Zhang, Huaxiang
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 4541 - 4554
  • [49] Application of smart card data in validating a large-scale multi-modal transit assignment model
    Tavassoli A.
    Mesbah M.
    Hickman M.
    [J]. Tavassoli, Ahmad (a.tavassoli@uq.edu.au), 2018, Springer Verlag (10) : 1 - 21
  • [50] GuideRender: large-scale scene navigation based on multi-modal view frustum movement prediction
    Yiming Qin
    Xiaoyu Chi
    Bin Sheng
    Rynson W. H. Lau
    [J]. The Visual Computer, 2023, 39 : 3597 - 3607