Cross-modal contrastive learning with multi-hierarchical tracklet clustering for multi object tracking

被引:0
|
作者
Hong, Ru
Yang, Jiming
Cai, Zeyu
Da, Feipeng [1 ]
机构
[1] Southeast Univ, Sch Automat, Nanjing 210096, Jiangsu, Peoples R China
关键词
Multi-object tracking; Contrastive learning; Multi-modal feature fusion;
D O I
10.1016/j.patrec.2025.02.032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The tracklet-based offline multi-object tracking (MOT) paradigm addresses the challenge of long-term association in online mode by utilizing global optimization for tracklet clustering in videos. The key to accurate offline MOT lies in establishing robust similarity between tracklets by leveraging both their temporal motion and appearance cues. To this end, we propose a multi-hierarchical tracklet clustering method based on cross- modal contrastive learning, called MHCM2DMOT. This method incorporates three key techniques: (I) A tracklet generation strategy based on motion association uniqueness, which ensures efficient object association across consecutive frames while preserving identity uniqueness; (II) Encoding tracklet motion and appearance cues through both language and visual models, enhancing interaction between different modal features via cross- modal contrastive learning to produce more distinct multi-modal fusion similarities; (III) A multi-hierarchical tracklet clustering method using graph attention network, which balances tracking performance with inference speed. Our tracker achieves state-of-the-art results on popular MOT datasets, ensuring accurate tracking performance.
引用
收藏
页码:1 / 7
页数:7
相关论文
共 50 条
  • [41] Enriched Music Representations With Multiple Cross-Modal Contrastive Learning
    Ferraro, Andres
    Favory, Xavier
    Drossos, Konstantinos
    Kim, Yuntae
    Bogdanov, Dmitry
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 733 - 737
  • [42] Cross-modal Contrastive Learning for Multimodal Fake News Detection
    Wang, Longzheng
    Zhang, Chuang
    Xu, Hongbo
    Xu, Yongxiu
    Xu, Xiaohan
    Wang, Siqi
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5696 - 5704
  • [43] Momentum Cross-Modal Contrastive Learning for Video Moment Retrieval
    Han, De
    Cheng, Xing
    Guo, Nan
    Ye, Xiaochun
    Rainer, Benjamin
    Priller, Peter
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 5977 - 5994
  • [44] Improving Spoken Language Understanding with Cross-Modal Contrastive Learning
    Dong, Jingjing
    Fu, Jiayi
    Zhou, Peng
    Li, Hao
    Wang, Xiaorui
    INTERSPEECH 2022, 2022, : 2693 - 2697
  • [45] Split and Connect: A Universal Tracklet Booster for Multi-Object Tracking
    Wang, Gaoang
    Wang, Yizhou
    Gu, Renshu
    Hu, Weijie
    Hwang, Jenq-Neng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1256 - 1268
  • [46] Multi-object Tracking with Spatial-Temporal Tracklet Association
    You, Sisi
    Yao, Hantao
    Bao, Bing-Kun
    Xu, Changsheng
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (05)
  • [47] Learning Cross-Modal Contrastive Features for Video Domain Adaptation
    Kim, Donghyun
    Tsai, Yi-Hsuan
    Zhuang, Bingbing
    Yu, Xiang
    Sclaroff, Stan
    Saenko, Kate
    Chandraker, Manmohan
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13598 - 13607
  • [48] A Cross-modal image retrieval method based on contrastive learning
    Zhou, Wen
    JOURNAL OF OPTICS-INDIA, 2023, 53 (3): : 2098 - 2107
  • [49] Cross-Modal Contrastive Learning for Remote Sensing Image Classification
    Feng, Zhixi
    Song, Liangliang
    Yang, Shuyuan
    Zhang, Xinyu
    Jiao, Licheng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [50] Simultaneous Clustering and Tracklet Linking for Multi-Face Tracking in Videos
    Wu, Baoyuan
    Lyu, Siwei
    Hu, Bao-Gang
    Ji, Qiang
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 2856 - 2863