An Adaptive Wordpiece Language Model for Learning Chinese Word Embeddings

被引:0
|
作者
Xu, BinChen [1 ]
Ma, Lu [1 ]
Zhang, Liang [1 ]
Li, HaoHai [2 ,3 ]
Kang, Qi [1 ]
Zhou, MengChu [4 ]
机构
[1] Tongji Univ, Sch Elect & Informat Engn, Dept Control Sci & Engn, Shanghai 201804, Peoples R China
[2] Ravenscroft Sch, Raleigh, NC 27615 USA
[3] Bi DaAI Lab, Beijing, Peoples R China
[4] New Jersey Inst Technol, Dept Elect & Comp Engn, Newark, NJ 07102 USA
关键词
D O I
10.1109/coase.2019.8843151
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Word representations are crucial for many nature language processing tasks. Most of the existing approaches learn contextual information by assigning a distinct vector to each word and pay less attention to morphology. It is a problem for them to deal with large vocabularies and rare words. In this paper we propose an Adaptive Wordpiece Language Model for learning Chinese word embeddings (AWLM), as inspired by previous observation that subword units are important for improving the learning of Chinese word representation. Specifically, a novel approach called BPE+ is established to adaptively generates variable length of grams which breaks the limitation of stroke n-grams. The semantical information extraction is completed by three elaborated parts i.e., extraction of morphological information, reinforcement of fine-grained information and extraction of semantical information. Empirical results on word similarity, word analogy, text classification and question answering verify that our method significantly outperforms several state-of-the-art methods.
引用
收藏
页码:812 / 817
页数:6
相关论文
共 50 条
  • [1] Improved Learning of Chinese Word Embeddings with Semantic Knowledge
    Yang, Liner
    Sun, Maosong
    [J]. CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA (CCL 2015), 2015, 9427 : 15 - 25
  • [2] Adaptive GloVe and FastText Model for Hindi Word Embeddings
    Gaikwad, Vijay
    Haribhakta, Yashodhara
    [J]. PROCEEDINGS OF THE 7TH ACM IKDD CODS AND 25TH COMAD (CODS-COMAD 2020), 2020, : 175 - 179
  • [3] Definition Modeling: Learning to Define Word Embeddings in Natural Language
    Noraset, Thanapon
    Liang, Chen
    Birnbaum, Larry
    Downey, Doug
    [J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3259 - 3266
  • [4] Learning Chinese word embeddings from semantic and phonetic components
    Wang, Fu Lee
    Lu, Yuyin
    Cheng, Gary
    Xie, Haoran
    Rao, Yanghui
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (29) : 42805 - 42820
  • [5] Learning Chinese word embeddings from semantic and phonetic components
    Fu Lee Wang
    Yuyin Lu
    Gary Cheng
    Haoran Xie
    Yanghui Rao
    [J]. Multimedia Tools and Applications, 2022, 81 : 42805 - 42820
  • [6] Learning chinese word embeddings from character structural information
    Ma, Bing
    Qi, Qi
    Liao, Jianxin
    Sun, Haifeng
    Wang, Jingyu
    [J]. COMPUTER SPEECH AND LANGUAGE, 2020, 60
  • [7] Word Embeddings for the Polish Language
    Rogalski, Marek
    Szczepaniak, Piotr S.
    [J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2016, 2016, 9692 : 126 - 135
  • [8] Adaptive Compression of Word Embeddings
    Kim, Yeachan
    Kim, Kang-Min
    Lee, SangKeun
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3950 - 3959
  • [9] Chinese Word Embeddings with Subwords
    Yang, Gang
    Xu, Hongzhe
    Li, Wen
    [J]. 2018 INTERNATIONAL CONFERENCE ON ALGORITHMS, COMPUTING AND ARTIFICIAL INTELLIGENCE (ACAI 2018), 2018,
  • [10] Word Dependency Sketch for Chinese Language Learning
    Shih, Meng-Hsien
    Hsieh, Shu-Kai
    [J]. CONCENTRIC-STUDIES IN LINGUISTICS, 2016, 42 (01) : 45 - 72