MEID: Mixture-of-Experts with Internal Distillation for Long-Tailed Video Recognition

被引:0
|
作者
Li, Xinjie [1 ]
Xu, Huijuan [1 ]
机构
[1] Penn State Univ, University Pk, PA 16802 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The long-tailed video recognition problem is especially challenging, as videos tend to be long and untrimmed, and each video may contain multiple classes, causing frame-level class imbalance. The previous method tackles the long-tailed video recognition only through frame-level sampling for class rebalance without distinguishing the frame-level feature representation between head and tail classes. To improve the frame-level feature representation of tail classes, we modulate the frame-level features with an auxiliary distillation loss to reduce the distribution distance between head and tail classes. Moreover, we design a mixture-of-experts framework with two different expert designs, i.e., the first expert with an attention-based classification network handling the original long-tailed distribution, and the second expert dealing with the re-balanced distribution from class-balanced sampling. Notably, in the second expert, we specifically focus on the frames unsolved by the first expert by designing a complementary frame selection module, which inherits the attention weights from the first expert and selects frames with low attention weights, and we also enhance the motion feature representation for these selected frames. To highlight the multi-label challenge in long-tailed video recognition, we create two additional benchmarks based on Charades and CharadesEgo videos with the multi-label property, called CharadesLT and CharadesEgoLT. Extensive experiments are conducted on the existing long-tailed video benchmark VideoLT and the two new benchmarks to verify the effectiveness of our proposed method with state-of-the-art performance. The code and proposed benchmarks are released at https://github.com/VisionLanguageLab/MEID.
引用
收藏
页码:1451 / 1459
页数:9
相关论文
共 50 条
  • [1] Mixture-of-Experts Learner for Single Long-Tailed Domain Generalization
    Wang, Mengzhu
    Yuan, Jianlong
    Wang, Zhibin
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 290 - 299
  • [2] MDCS: More Diverse Experts with Consistency Self-distillation for Long-tailed Recognition
    Zhao, Qihao
    Jiang, Chen
    Hu, Wei
    Zhang, Fan
    Liu, Jun
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11563 - 11574
  • [3] Balanced Product of Calibrated Experts for Long-Tailed Recognition
    Aimar, Emanuel Sanchez
    Jonnarth, Arvi
    Felsberg, Michael
    Kuhlmann, Marco
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19967 - 19977
  • [4] Balanced self-distillation for long-tailed recognition
    Ren, Ning
    Li, Xiaosong
    Wu, Yanxia
    Fu, Yan
    KNOWLEDGE-BASED SYSTEMS, 2024, 290
  • [5] Self Supervision to Distillation for Long-Tailed Visual Recognition
    Li, Tianhao
    Wang, Limin
    Wu, Gangshan
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 610 - 619
  • [6] MoDE: A Mixture-of-Experts Model with Mutual Distillation among the Experts
    Xie, Zhitian
    Zhang, Yinger
    Zhuang, Chenyi
    Shi, Qitao
    Liu, Zhining
    Gu, Jinjie
    Zhang, Guannan
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 16067 - 16075
  • [7] Virtual Student Distribution Knowledge Distillation for Long-Tailed Recognition
    Liu, Haodong
    Huang, Xinlei
    Tang, Jialiang
    Jiang, Ning
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT IV, 2025, 15034 : 406 - 419
  • [8] Towards Long-Tailed Recognition for Graph Classification via Collaborative Experts
    Yi S.-Y.
    Mao Z.
    Ju W.
    Zhou Y.-D.
    Liu L.
    Luo X.
    Zhang M.
    IEEE Transactions on Big Data, 2023, 9 (06): : 1683 - 1696
  • [9] Relational Subsets Knowledge Distillation for Long-Tailed Retinal Diseases Recognition
    Ju, Lie
    Wang, Xin
    Wang, Lin
    Liu, Tongliang
    Zhao, Xin
    Drummond, Tom
    Mahapatra, Dwarikanath
    Ge, Zongyuan
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT VIII, 2021, 12908 : 3 - 12
  • [10] VideoLT: Large-scale Long-tailed Video Recognition
    Zhang, Xing
    Wu, Zuxuan
    Weng, Zejia
    Fu, Huazhu
    Chen, Jingjing
    Jiang, Yu-Gang
    Davis, Larry
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7940 - 7949