Cross-modal contrastive learning with multi-hierarchical tracklet clustering for multi object tracking

被引:0
|
作者
Hong, Ru
Yang, Jiming
Cai, Zeyu
Da, Feipeng [1 ]
机构
[1] Southeast Univ, Sch Automat, Nanjing 210096, Jiangsu, Peoples R China
关键词
Multi-object tracking; Contrastive learning; Multi-modal feature fusion;
D O I
10.1016/j.patrec.2025.02.032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The tracklet-based offline multi-object tracking (MOT) paradigm addresses the challenge of long-term association in online mode by utilizing global optimization for tracklet clustering in videos. The key to accurate offline MOT lies in establishing robust similarity between tracklets by leveraging both their temporal motion and appearance cues. To this end, we propose a multi-hierarchical tracklet clustering method based on cross- modal contrastive learning, called MHCM2DMOT. This method incorporates three key techniques: (I) A tracklet generation strategy based on motion association uniqueness, which ensures efficient object association across consecutive frames while preserving identity uniqueness; (II) Encoding tracklet motion and appearance cues through both language and visual models, enhancing interaction between different modal features via cross- modal contrastive learning to produce more distinct multi-modal fusion similarities; (III) A multi-hierarchical tracklet clustering method using graph attention network, which balances tracking performance with inference speed. Our tracker achieves state-of-the-art results on popular MOT datasets, ensuring accurate tracking performance.
引用
收藏
页码:1 / 7
页数:7
相关论文
共 50 条
  • [21] Graph Embedding Contrastive Multi-Modal Representation Learning for Clustering
    Xia, Wei
    Wang, Tianxiu
    Gao, Quanxue
    Yang, Ming
    Gao, Xinbo
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 1170 - 1183
  • [22] Multi-task clustering ELM for VIS-NIR cross-modal feature learning
    Yi Jin
    Jie Li
    Congyan Lang
    Qiuqi Ruan
    Multidimensional Systems and Signal Processing, 2017, 28 : 905 - 920
  • [23] Multi-task clustering ELM for VIS-NIR cross-modal feature learning
    Jin, Yi
    Li, Jie
    Lang, Congyan
    Ruan, Qiuqi
    MULTIDIMENSIONAL SYSTEMS AND SIGNAL PROCESSING, 2017, 28 (03) : 905 - 920
  • [24] Cross-modal contrastive learning for multimodal sentiment recognition
    Yang, Shanliang
    Cui, Lichao
    Wang, Lei
    Wang, Tao
    APPLIED INTELLIGENCE, 2024, 54 (05) : 4260 - 4276
  • [25] Cross-Modal Graph Contrastive Learning with Cellular Images
    Zheng, Shuangjia
    Rao, Jiahua
    Zhang, Jixian
    Zhou, Lianyu
    Xie, Jiancong
    Cohen, Ethan
    Lu, Wei
    Li, Chengtao
    Yang, Yuedong
    ADVANCED SCIENCE, 2024, 11 (32)
  • [26] Cross-modal contrastive learning for multimodal sentiment recognition
    Shanliang Yang
    Lichao Cui
    Lei Wang
    Tao Wang
    Applied Intelligence, 2024, 54 : 4260 - 4276
  • [27] Hypergraph clustering based multi-label cross-modal retrieval
    Guo, Shengtang
    Zhang, Huaxiang
    Liu, Li
    Liu, Dongmei
    Lu, Xu
    Li, Liujian
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 103
  • [28] TRAJCROSS: Trajecotry Cross-Modal Retrieval with Contrastive Learning
    Jing, Quanliang
    Yao, Di
    Gong, Chang
    Fan, Xinxin
    Wang, Baoli
    Tan, Haining
    Bi, Jingping
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 344 - 349
  • [29] MULTI-HIERARCHICAL INDEPENDENT CORRELATION FILTERS FOR VISUAL TRACKING
    Bai, Shuai
    He, Zhiqun
    Dong, Yuan
    Bai, Hongliang
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
  • [30] Aggregate Tracklet Appearance Features for Multi-Object Tracking
    Chen, Long
    Ai, Haizhou
    Chen, Rui
    Zhuang, Zijie
    IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (11) : 1613 - 1617