A Hierarchical Framwork with Improved Loss for Large-scale Multi-modal Video Identification

被引:0
|
作者
Zhang, Shichuan [1 ]
Tang, Zengming [1 ]
Pan, Hao [1 ]
Wei, Xinyu [1 ]
Huang, Jun [1 ]
机构
[1] Shanghai Adv Res Inst, Shanghai, Peoples R China
关键词
video identification; models combination; feature fusion; improved loss function;
D O I
10.1145/3343031.3356074
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper introduces our solution for iQIYI Celebrity Video Identification Challenge. After analyzing the iQIYI-VID-2019 dataset, we find the distribution of the dataset is very unbalanced and there are many unlabeled samples in the validation set and the test set. For these challenge, we propose a hierarchical system which combines different models and fuses base classifiers. For the false detections and low-quality features in the dataset, we use a simple and reasonable strategy to fuse features. In order to detect videos more accurately, we choose an improved loss function for the learning of base classifiers. Experiment results show that our framework performs well and evaluation conducted by the organizers shows that our final result gets the ninth place online and mAP 88.08%.
引用
收藏
页码:2539 / 2542
页数:4
相关论文
共 50 条
  • [1] Towards Good Practices for Multi-modal Fusion in Large-Scale Video Classification
    Liu, Jinlai
    Yuan, Zehuan
    Wang, Changhu
    [J]. COMPUTER VISION - ECCV 2018 WORKSHOPS, PT IV, 2019, 11132 : 287 - 296
  • [2] Efficient Large-Scale Multi-Modal Classification
    Kiela, Douwe
    Grave, Edouard
    Joulin, Armand
    Mikolov, Tomas
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 5198 - 5204
  • [3] Multi-Modal Learning: Study on A Large-Scale Micro-Video Data Collection
    Chen, Jingyuan
    [J]. MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, : 1454 - 1458
  • [4] Large-scale Multi-modal Search and QA at Alibaba
    Jin, Rong
    [J]. PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 8 - 8
  • [5] MMpedia: A Large-Scale Multi-modal Knowledge Graph
    Wu, Yinan
    Wu, Xiaowei
    Li, Junwen
    Zhang, Yue
    Wang, Haofen
    Du, Wen
    He, Zhidong
    Liu, Jingping
    Ruan, Tong
    [J]. SEMANTIC WEB, ISWC 2023, PT II, 2023, 14266 : 18 - 37
  • [6] Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation
    Zeng, Zhaoyang
    Luo, Yongsheng
    Liu, Zhenhua
    Rao, Fengyun
    Li, Dian
    Guo, Weidong
    Wen, Zhen
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3128 - 3137
  • [7] Exploring a large-scale multi-modal transportation recommendation system
    Liu, Yang
    Lyu, Cheng
    Liu, Zhiyuan
    Cao, Jinde
    [J]. TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2021, 126
  • [8] Richpedia: A Large-Scale, Comprehensive Multi-Modal Knowledge Graph
    Wang, Meng
    Wang, Haofen
    Qi, Guilin
    Zheng, Qiushuo
    [J]. BIG DATA RESEARCH, 2020, 22
  • [9] Operational planning of a large-scale multi-modal transportation system
    Jansen, B
    Swinkels, PCJ
    Teeuwen, GJA
    de Fluiter, BV
    Fleuren, HA
    [J]. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2004, 156 (01) : 41 - 53
  • [10] Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation
    Niu, Yulei
    Lu, Zhiwu
    Wen, Ji-Rong
    Xiang, Tao
    Chang, Shih-Fu
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (04) : 1720 - 1731