Cross-modal contrastive learning with multi-hierarchical tracklet clustering for multi object tracking

被引:0
|
作者
Hong, Ru
Yang, Jiming
Cai, Zeyu
Da, Feipeng [1 ]
机构
[1] Southeast Univ, Sch Automat, Nanjing 210096, Jiangsu, Peoples R China
关键词
Multi-object tracking; Contrastive learning; Multi-modal feature fusion;
D O I
10.1016/j.patrec.2025.02.032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The tracklet-based offline multi-object tracking (MOT) paradigm addresses the challenge of long-term association in online mode by utilizing global optimization for tracklet clustering in videos. The key to accurate offline MOT lies in establishing robust similarity between tracklets by leveraging both their temporal motion and appearance cues. To this end, we propose a multi-hierarchical tracklet clustering method based on cross- modal contrastive learning, called MHCM2DMOT. This method incorporates three key techniques: (I) A tracklet generation strategy based on motion association uniqueness, which ensures efficient object association across consecutive frames while preserving identity uniqueness; (II) Encoding tracklet motion and appearance cues through both language and visual models, enhancing interaction between different modal features via cross- modal contrastive learning to produce more distinct multi-modal fusion similarities; (III) A multi-hierarchical tracklet clustering method using graph attention network, which balances tracking performance with inference speed. Our tracker achieves state-of-the-art results on popular MOT datasets, ensuring accurate tracking performance.
引用
收藏
页码:1 / 7
页数:7
相关论文
共 50 条
  • [31] Multi-grained Representation Learning for Cross-modal Retrieval
    Zhao, Shengwei
    Xu, Linhai
    Liu, Yuying
    Du, Shaoyi
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 2194 - 2198
  • [32] Integrating Multi-Label Contrastive Learning With Dual Adversarial Graph Neural Networks for Cross-Modal Retrieval
    Qian, Shengsheng
    Xue, Dizhan
    Fang, Quan
    Xu, Changsheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (04) : 4794 - 4811
  • [33] Self-Supervised Multi-Modal Knowledge Graph Contrastive Hashing for Cross-Modal Search
    Liang, Meiyu
    Du, Junping
    Liang, Zhengyang
    Xing, Yongwang
    Huang, Wei
    Xue, Zhe
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 13744 - 13753
  • [34] Prototype-based cross-modal object tracking
    Liu, Lei
    Li, Chenglong
    Wang, Futian
    Shen, Longfeng
    Tang, Jin
    INFORMATION FUSION, 2025, 118
  • [35] mmMCL3DMOT: Multi-Modal Momentum Contrastive Learning for 3D Multi-Object Tracking
    Hong, Ru
    Yang, Jiming
    Zhou, Weidian
    Da, Feipeng
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 1895 - 1899
  • [36] Complementarity is the king: Multi-modal and multi-grained hierarchical semantic enhancement network for cross-modal retrieval
    Pei, Xinlei
    Liu, Zheng
    Gao, Shanshan
    Su, Yijun
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 216
  • [37] Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch
    Dey, Sounak
    Dutta, Anjan
    Ghosh, Suman K.
    Valveny, Ernest
    Llados, Josep
    Pal, Umapada
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 916 - 921
  • [38] Cross-modal contrastive learning for aspect-based recommendation
    Won, Heesoo
    Oh, Byungkook
    Yang, Hyeongjun
    Lee, Kyong-Ho
    INFORMATION FUSION, 2023, 99
  • [39] A Cross-modal image retrieval method based on contrastive learning
    Zhou, Wen
    JOURNAL OF OPTICS-INDIA, 2024, 53 (03): : 2098 - 2107
  • [40] Cross-Modal Contrastive Learning for Text-to-Image Generation
    Zhang, Han
    Koh, Jing Yu
    Baldridge, Jason
    Lee, Honglak
    Yang, Yinfei
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 833 - 842