MEID: Mixture-of-Experts with Internal Distillation for Long-Tailed Video Recognition

被引:0
|
作者
Li, Xinjie [1 ]
Xu, Huijuan [1 ]
机构
[1] Penn State Univ, University Pk, PA 16802 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The long-tailed video recognition problem is especially challenging, as videos tend to be long and untrimmed, and each video may contain multiple classes, causing frame-level class imbalance. The previous method tackles the long-tailed video recognition only through frame-level sampling for class rebalance without distinguishing the frame-level feature representation between head and tail classes. To improve the frame-level feature representation of tail classes, we modulate the frame-level features with an auxiliary distillation loss to reduce the distribution distance between head and tail classes. Moreover, we design a mixture-of-experts framework with two different expert designs, i.e., the first expert with an attention-based classification network handling the original long-tailed distribution, and the second expert dealing with the re-balanced distribution from class-balanced sampling. Notably, in the second expert, we specifically focus on the frames unsolved by the first expert by designing a complementary frame selection module, which inherits the attention weights from the first expert and selects frames with low attention weights, and we also enhance the motion feature representation for these selected frames. To highlight the multi-label challenge in long-tailed video recognition, we create two additional benchmarks based on Charades and CharadesEgo videos with the multi-label property, called CharadesLT and CharadesEgoLT. Extensive experiments are conducted on the existing long-tailed video benchmark VideoLT and the two new benchmarks to verify the effectiveness of our proposed method with state-of-the-art performance. The code and proposed benchmarks are released at https://github.com/VisionLanguageLab/MEID.
引用
收藏
页码:1451 / 1459
页数:9
相关论文
共 50 条
  • [21] ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot
    Cai, Jiarui
    Wang, Yizhou
    Hwang, Jenq-Neng
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 112 - 121
  • [22] A Survey on Long-Tailed Visual Recognition
    Yang, Lu
    Jiang, He
    Song, Qing
    Guo, Jun
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (07) : 1837 - 1872
  • [23] Multimodal Framework for Long-Tailed Recognition
    Chen, Jian
    Zhao, Jianyin
    Gu, Jiaojiao
    Qin, Yufeng
    Ji, Hong
    APPLIED SCIENCES-BASEL, 2024, 14 (22):
  • [24] Improving Calibration for Long-Tailed Recognition
    Zhong, Zhisheng
    Cui, Jiequan
    Liu, Shu
    Jia, Jiaya
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 16484 - 16493
  • [25] A Survey on Long-Tailed Visual Recognition
    Lu Yang
    He Jiang
    Qing Song
    Jun Guo
    International Journal of Computer Vision, 2022, 130 : 1837 - 1872
  • [26] Multiple Contrastive Experts for long-tailed image classification
    Wang, Yandan
    Sun, Kaiyin
    Guo, Chenqi
    Zhong, Shiwei
    Liu, Huili
    Ma, Yinglong
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
  • [27] Global Balanced Experts for Federated Long-Tailed Learning
    Zeng, Yaopei
    Liu, Lei
    Liu, Li
    Shen, Li
    Liu, Shaoguo
    Wu, Baoyuan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 4792 - 4802
  • [28] Long-tailed video recognition via majority-guided diffusion model
    Hu, Yufan
    Zhang, Yi
    Zhang, Lixin
    MULTIMEDIA SYSTEMS, 2025, 31 (01)
  • [29] Steered Mixture-of-Experts for Light Field Images and Video: Representation and Coding
    Verhack, Ruben
    Sikora, Thomas
    Van Wallendael, Glenn
    Lambert, Peter
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (03) : 579 - 593
  • [30] Video Representation and Coding Using a Sparse Steered Mixture-of-Experts Network
    Lange, Lieven
    Verhack, Ruben
    Sikora, Thomas
    2016 PICTURE CODING SYMPOSIUM (PCS), 2016,