Cross-modal contrastive learning with multi-hierarchical tracklet clustering for multi object tracking

被引:0
|
作者
Hong, Ru
Yang, Jiming
Cai, Zeyu
Da, Feipeng [1 ]
机构
[1] Southeast Univ, Sch Automat, Nanjing 210096, Jiangsu, Peoples R China
关键词
Multi-object tracking; Contrastive learning; Multi-modal feature fusion;
D O I
10.1016/j.patrec.2025.02.032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The tracklet-based offline multi-object tracking (MOT) paradigm addresses the challenge of long-term association in online mode by utilizing global optimization for tracklet clustering in videos. The key to accurate offline MOT lies in establishing robust similarity between tracklets by leveraging both their temporal motion and appearance cues. To this end, we propose a multi-hierarchical tracklet clustering method based on cross- modal contrastive learning, called MHCM2DMOT. This method incorporates three key techniques: (I) A tracklet generation strategy based on motion association uniqueness, which ensures efficient object association across consecutive frames while preserving identity uniqueness; (II) Encoding tracklet motion and appearance cues through both language and visual models, enhancing interaction between different modal features via cross- modal contrastive learning to produce more distinct multi-modal fusion similarities; (III) A multi-hierarchical tracklet clustering method using graph attention network, which balances tracking performance with inference speed. Our tracker achieves state-of-the-art results on popular MOT datasets, ensuring accurate tracking performance.
引用
收藏
页码:1 / 7
页数:7
相关论文
共 50 条
  • [1] CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations
    Zolfaghari, Mohammadreza
    Zhu, Yi
    Gehler, Peter
    Brox, Thomas
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1430 - 1439
  • [2] MULTI-LEVEL CONTRASTIVE LEARNING FOR HYBRID CROSS-MODAL RETRIEVAL
    Zhao, Yiming
    Lu, Haoyu
    Zhao, Shiqi
    Wu, Haoran
    Lu, Zhiwu
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 6390 - 6394
  • [3] Contrastive Multi-Bit Collaborative Learning for Deep Cross-Modal Hashing
    Wu, Qingpeng
    Zhang, Zheng
    Liu, Yishu
    Zhang, Jingyi
    Nie, Liqiang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (11) : 5835 - 5848
  • [4] Improving Speech Translation by Cross-Modal Multi-Grained Contrastive Learning
    Zhang, Hao
    Si, Nianwen
    Chen, Yaqi
    Zhang, Wenlin
    Yang, Xukui
    Qu, Dan
    Zhang, Wei-Qiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1075 - 1086
  • [5] Gated Multi-modal Fusion with Cross-modal Contrastive Learning for Video Question Answering
    Lyu, Chenyang
    Li, Wenxi
    Ji, Tianbo
    Zhou, Liting
    Gurrin, Cathal
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VII, 2023, 14260 : 427 - 438
  • [6] Contrastive cross-modal clustering with twin network
    Mao, Yiqiao
    Yan, Xiaoqiang
    Hu, Shizhe
    Ye, Yangdong
    PATTERN RECOGNITION, 2024, 155
  • [7] Multi-similarity reconstructing and clustering-based contrastive hashing for cross-modal retrieval
    Xie, Conghua
    Gao, Yunmei
    Zhou, Qiyao
    Zhou, Jing
    INFORMATION SCIENCES, 2023, 647
  • [8] Multi-Label Weighted Contrastive Cross-Modal Hashing
    Yi, Zeqian
    Zhu, Xinghui
    Wu, Runbing
    Zou, Zhuoyang
    Liu, Yi
    Zhu, Lei
    APPLIED SCIENCES-BASEL, 2024, 14 (01):
  • [9] Multi-level cross-modal contrastive learning for review-aware recommendation
    Wei, Yibiao
    Xu, Yang
    Zhu, Lei
    Ma, Jingwei
    Peng, Chengmei
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 247
  • [10] Multi-modal Robustness Fake News Detection with Cross-Modal and Propagation Network Contrastive Learning
    Chen, Han
    Wang, Hairong
    Liu, Zhipeng
    Li, Yuhua
    Hu, Yifan
    Zhang, Yujing
    Shu, Kai
    Li, Ruixuan
    Yu, Philip S.
    KNOWLEDGE-BASED SYSTEMS, 2025, 309