CROSS-MODAL ALIGNMENT OF LOCAL AND GLOBAL FEATURES FOR ZERO-SHOT CHINESE CHARACTER RECOGNITION

被引:0
|
作者
Cai, Hongyi [1 ]
Zhu, Anna [1 ]
机构
[1] Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan, Peoples R China
来源
2024 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2024年
关键词
Chinese character recognition; Zero-shot learning; Cross-modal alignment; Local and global feature;
D O I
10.1109/ICIP51287.2024.10647599
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Chinese character recognition (CCR) is a pivotal domain in computer vision due to its complexity and diverse applications, especially given the extensive character categories posing challenges in identifying unseen characters. Addressing the zero-shot hurdle, we propose a CLIP-style model, which independently extracts features from aligned Chinese character images and Ideographic Description Sequences (IDS), achieving cross-modal alignment. Our approach encompasses local and global feature alignment. Initially, we introduce learnable discrete tokens to represent shared embeddings for visual and textual modalities, capturing the local context of Chinese characters. Then, encoding each radical extracts local features, mapped to shared discrete tokens via attention mechanisms. Additionally, encoding the entire character obtains global features. Training utilizes contrastive loss to facilitate cross-modal alignment. Experimental results confirm our method's superiority over conventional approaches, demonstrating remarkable performance on zero-shot Chinese character recognition benchmarks.
引用
收藏
页码:2041 / 2047
页数:7
相关论文
共 50 条
  • [31] Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition
    Hu, Yuchen
    Li, Ruizhe
    Chen, Chen
    Zou, Heqing
    Zhu, Qiushi
    Chng, Eng Siong
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 5076 - 5084
  • [32] Cross-Modal Zero-Shot-Learning for Tactile Object Recognition
    Liu, Huaping
    Sun, Fuchun
    Fang, Bin
    Guo, Di
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2020, 50 (07): : 2466 - 2474
  • [33] Mining Contrastive Relations Between Cross-Modal Features for Zero-Shot Remote Sensing Image Scene Classification
    Liu, Chun
    Ma, Suqiang
    Li, Zheng
    Yang, Wei
    Han, Zhigang
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21
  • [34] Cross-Modal Attention Alignment Network with Auxiliary Text Description for Zero-Shot Sketch-Based Image Retrieval
    Su, Hanwen
    Song, Ge
    Huang, Kai
    Wang, Jiyan
    Yang, Ming
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT VI, 2024, 15021 : 52 - 65
  • [35] Towards Zero-shot Learning for End-to-end Cross-modal Translation Models
    Yang, Jichen
    Fang, Kai
    Liao, Minpeng
    Chen, Boxing
    Huang, Zhongqiang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 13078 - 13087
  • [36] Ternary Adversarial Networks With Self-Supervision for Zero-Shot Cross-Modal Retrieval
    Xu, Xing
    Lu, Huimin
    Song, Jingkuan
    Yang, Yang
    Shen, Heng Tao
    Li, Xuelong
    IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (06) : 2400 - 2413
  • [37] Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language
    Mercea, Otniel-Bogdan
    Riesch, Lukas
    Koepke, A. Sophia
    Akata, Zeynep
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10543 - 10553
  • [38] RADICAL ANALYSIS NETWORK FOR ZERO-SHOT LEARNING IN PRINTED CHINESE CHARACTER RECOGNITION
    Zhang, Jianshu
    Zhu, Yixing
    Du, Jun
    Dai, Lirong
    2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2018,
  • [39] Semantic-Adversarial Graph Convolutional Network for Zero-Shot Cross-Modal Retrieval
    Li, Chuang
    Fei, Lunke
    Kang, Peipei
    Liang, Jiahao
    Fang, Xiaozhao
    Teng, Shaohua
    PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT II, 2022, 13630 : 459 - 472
  • [40] INTER-MODALITY FUSION BASED ATTENTION FOR ZERO-SHOT CROSS-MODAL RETRIEVAL
    Chakraborty, Bela
    Wang, Peng
    Wang, Lei
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2648 - 2652