Empirical Exploring Word-Character Relationship for Chinese Sentence Representation

被引:8
|
作者
Wang, Shaonan [1 ]
Zhang, Jiajun [1 ]
Zong, Chengqing [2 ]
机构
[1] Univ Chinese Acad Sci, Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Intelligence Bldg,498 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Chinese Acad Sci, CAS Ctr Excellence Brain Sci & Intelligence Techn, Natl Lab Pattern Recognit,Inst Automat, Intelligence Bldg,498 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
关键词
Sentence representation; compositionmodel; inner-word character; mixed character-word representation; mask gate; max pooling;
D O I
10.1145/3156778
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article addresses the problem of learning compositional Chinese sentence representations, which represent the meaning of a sentence by composing the meanings of its constituent words. In contrast to English, a Chinese word is composed of characters, which contain rich semantic information. However, this information has not been fully exploited by existing methods. In this work, we introduce a novel, mixed character-word architecture to improve the Chinese sentence representations by utilizing rich semantic information of inner-word characters. We propose two novel strategies to reach this purpose. The first one is to use a mask gate on characters, learning the relation among characters in a word. The second one is to use a max-pooling operation on words to adaptively find the optimal mixture of the atomic and compositional word representations. Finally, the proposed architecture is applied to various sentence composition models, which achieves substantial performance gains over baseline models on sentence similarity task. To further verify the generalization ability of our model, we employ the learned sentence representations as features in sentence classification task, question classification task, and sentence entailment task. Results have shown that the proposed mixed character-word sentence representation models outperform both the character-based and word-based models.
引用
收藏
页数:18
相关论文
共 50 条
  • [21] Learning the Chinese Sentence Representation with LSTM Autoencoder
    Chen, Mu-Yen
    Huang, Tien-Chi
    Shu, Yu
    Chen, Chia-Chen
    Hsieh, Tsung-Che
    Yen, Neil Y.
    [J]. COMPANION PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2018 (WWW 2018), 2018, : 403 - 408
  • [22] Sentence-level Chinese Character input method
    Xu, Zhiming
    Wang, Xiaolong
    Jiang, Shouxu
    [J]. Gaojishu Tongxin/High Technology Letters, 2000, 10 (01): : 51 - 55
  • [23] THE EVOLUTION OF WORD REPRESENTATION DURING SENTENCE COMPREHENSION - A PROBE-WORD STUDY
    ACHOUR, L
    LENY, JF
    [J]. ANNEE PSYCHOLOGIQUE, 1983, 83 (02): : 409 - 422
  • [24] From the representation of strokes of a Chinese character to the representation of a region
    Fu, Qingxiang
    Wang, Xiaodong
    [J]. Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design & Computer Graphics, 1996, 8 (04):
  • [25] An Empirical Study of Korean Sentence Representation with Various Tokenizations
    Cho, Danbi
    Lee, Hyunyoung
    Kang, Seungshik
    [J]. ELECTRONICS, 2021, 10 (07)
  • [26] Which is essential for Chinese word segmentation: Character versus word
    Huang, Chang-Ning
    Zhao, Hai
    [J]. PACLIC 20: PROCEEDINGS OF THE 20TH PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, 2006, : 1 - 12
  • [27] Chinese Sentence Decomposition Based on Hierarchical Word Order
    Pi, Qian Dong
    Shao, Yu Bin
    Long, Hua
    Yang, Chen Ju
    [J]. PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON MECHATRONICS AND INTELLIGENT ROBOTICS (ICMIR-2019), 2020, 166 : 469 - 474
  • [28] Chinese Sentence Similarity based on Word Context and Semantic
    Gu, Tianjiao
    Ren, Fuji
    [J]. IEEE NLP-KE 2009: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2009, : 535 - 539
  • [29] Multiple Character Embeddings for Chinese Word Segmentation
    Wang, Jingkang
    Zhou, Jianing
    Zhou, Jie
    Liu, Gongshen
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 210 - 216
  • [30] Integrating Character Representations into Chinese Word Embedding
    Leshan Normal University, China
    [J]. Lect. Notes Comput. Sci.,