CROSS-MODAL ALIGNMENT OF LOCAL AND GLOBAL FEATURES FOR ZERO-SHOT CHINESE CHARACTER RECOGNITION

被引:0
|
作者
Cai, Hongyi [1 ]
Zhu, Anna [1 ]
机构
[1] Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan, Peoples R China
来源
2024 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2024年
关键词
Chinese character recognition; Zero-shot learning; Cross-modal alignment; Local and global feature;
D O I
10.1109/ICIP51287.2024.10647599
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Chinese character recognition (CCR) is a pivotal domain in computer vision due to its complexity and diverse applications, especially given the extensive character categories posing challenges in identifying unseen characters. Addressing the zero-shot hurdle, we propose a CLIP-style model, which independently extracts features from aligned Chinese character images and Ideographic Description Sequences (IDS), achieving cross-modal alignment. Our approach encompasses local and global feature alignment. Initially, we introduce learnable discrete tokens to represent shared embeddings for visual and textual modalities, capturing the local context of Chinese characters. Then, encoding each radical extracts local features, mapped to shared discrete tokens via attention mechanisms. Additionally, encoding the entire character obtains global features. Training utilizes contrastive loss to facilitate cross-modal alignment. Experimental results confirm our method's superiority over conventional approaches, demonstrating remarkable performance on zero-shot Chinese character recognition benchmarks.
引用
收藏
页码:2041 / 2047
页数:7
相关论文
共 50 条
  • [1] Cross-modal prototype learning for zero-shot handwritten character recognition
    Ao, Xiang
    Zhang, Xu-Yao
    Liu, Cheng-Lin
    PATTERN RECOGNITION, 2022, 131
  • [2] Correlated Features Synthesis and Alignment for Zero-shot Cross-modal Retrieval
    Xu, Xing
    Lin, Kaiyi
    Lu, Huimin
    Gao, Lianli
    Shen, Heng Tao
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1419 - 1428
  • [3] A Cross-Modal Alignment for Zero-Shot Image Classification
    Wu, Lu
    Wu, Chenyu
    Guo, Han
    Zhao, Zhihao
    IEEE ACCESS, 2023, 11 : 9067 - 9073
  • [4] Cross-modal Zero-shot Hashing
    Liu, Xuanwu
    Li, Zhao
    Wang, Jun
    Yu, Guoxian
    Domeniconi, Carlotta
    Zhang, Xiangliang
    2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019), 2019, : 449 - 458
  • [5] Cross-modal Representation Learning for Zero-shot Action Recognition
    Lin, Chung-Ching
    Lin, Kevin
    Wang, Lijuan
    Liu, Zicheng
    Li, Linjie
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19946 - 19956
  • [6] Generalized Zero-Shot Cross-Modal Retrieval
    Dutta, Titir
    Biswas, Soma
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (12) : 5953 - 5962
  • [7] Cross-modal distribution alignment embedding network for generalized zero-shot learning
    Li, Qin
    Hou, Mingzhen
    Lai, Hong
    Yang, Ming
    NEURAL NETWORKS, 2022, 148 : 176 - 182
  • [8] CROSS-MODAL REPRESENTATION RECONSTRUCTION FOR ZERO-SHOT CLASSIFICATION
    Wang, Yu
    Zhao, Shenjie
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2820 - 2824
  • [9] Manifold regularized cross-modal embedding for zero-shot learning
    Ji, Zhong
    Yu, Yunlong
    Pang, Yanwei
    Guo, Jichang
    Zhang, Zhongfei
    INFORMATION SCIENCES, 2017, 378 : 48 - 58
  • [10] Cross-modal propagation network for generalized zero-shot learning
    Guo, Ting
    Liang, Jianqing
    Liang, Jiye
    Xie, Guo-Sen
    PATTERN RECOGNITION LETTERS, 2022, 159 : 125 - 131