From Characters toWords: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding

被引:0
|
作者
Sun, Li [1 ]
Luisier, Florian [2 ]
Batmanghelich, Kayhan [1 ]
Florencio, Dinei [2 ]
Zhang, Cha [2 ]
机构
[1] Boston Univ, Boston, MA 02215 USA
[2] Microsoft, Redmond, WA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current state-of-the-art models for natural language understanding require a preprocessing step to convert raw text into discrete tokens. This process known as tokenization relies on a pre-built vocabulary of words or sub-word morphemes. This fixed vocabulary limits the model's robustness to spelling errors and its capacity to adapt to new domains. In this work, we introduce a novel open-vocabulary language model that adopts a hierarchical two-level approach: one at the word level and another at the sequence level. Concretely, we design an intraword module that uses a shallow Transformer architecture to learn word representations from their characters, and a deep inter-word Transformer module that contextualizes each word representation by attending to the entire word sequence. Our model thus directly operates on character sequences with explicit awareness of word boundaries, but without biased sub-word or word-level vocabulary. Experiments on various downstream tasks show that our method outperforms strong baselines. We also demonstrate that our hierarchical model is robust to textual corruption and domain shift.
引用
收藏
页码:3605 / 3620
页数:16
相关论文
共 50 条
  • [1] A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-Language Model
    Xu, Mengde
    Zhang, Zheng
    Wei, Fangyun
    Lin, Yutong
    Cao, Yue
    Hu, Han
    Bai, Xiang
    COMPUTER VISION, ECCV 2022, PT XXIX, 2022, 13689 : 736 - 753
  • [2] OPEN-VOCABULARY SKELETON ACTION RECOGNITION WITH DIFFUSION GRAPH CONVOLUTIONAL NETWORK AND PRE-TRAINED VISION-LANGUAGE MODELS
    Wei, Chao
    Deng, Zhidong
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 3195 - 3199
  • [3] ARoBERT: An ASR Robust Pre-Trained Language Model for Spoken Language Understanding
    Wang, Chengyu
    Dai, Suyang
    Wang, Yipeng
    Yang, Fei
    Qiu, Minghui
    Chen, Kehan
    Zhou, Wei
    Huang, Jun
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1207 - 1218
  • [4] Learning to Generate Language-supervised and Open-vocabulary Scene Graph using Pre-trained Visual-Semantic Space
    Zhang, Yong
    Pan, Yingwei
    Yao, Ting
    Huang, Rui
    Mei, Tao
    Chen, Chang-Wen
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2915 - 2924
  • [5] Pre-trained Language Model Representations for Language Generation
    Edunov, Sergey
    Baevski, Alexei
    Auli, Michael
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4052 - 4059
  • [6] Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models
    Chu, Wen-Hsuan
    Harley, Adam W.
    Tokmakov, Pavel
    Dave, Achal
    Guibas, Leonidas
    Fragkiadaki, Katerina
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 4916 - 4923
  • [7] Hyperbolic Pre-Trained Language Model
    Chen, Weize
    Han, Xu
    Lin, Yankai
    He, Kaichen
    Xie, Ruobing
    Zhou, Jie
    Liu, Zhiyuan
    Sun, Maosong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3101 - 3112
  • [8] Exploring Pre-trained Language Models for Vocabulary Alignment in the UMLS
    Hao, Xubing
    Abeysinghe, Rashmie
    Shi, Jay
    Cui, Licong
    ARTIFICIAL INTELLIGENCE IN MEDICINE, PT I, AIME 2024, 2024, 14844 : 273 - 278
  • [9] AnchiBERT: A Pre-Trained Model for Ancient Chinese Language Understanding and Generation
    Tian, Huishuang
    Yang, Kexin
    Liu, Dayiheng
    Lv, Jiancheng
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [10] JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding
    Zhao, Wayne Xin
    Zhou, Kun
    Gong, Zheng
    Zhang, Beichen
    Zhou, Yuanhang
    Sha, Jing
    Chen, Zhigang
    Wang, Shijin
    Liu, Cong
    Wen, Ji-Rong
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4571 - 4581