Modeling multi-prototype Chinese word representation learning for word similarity

被引:2
|
作者
Yin, Fulian [1 ]
Wang, Yanyan [1 ]
Liu, Jianbo [1 ]
Tosato, Marco [2 ]
机构
[1] Commun Univ China, Inst Informat & Commun, Beijing 100024, Peoples R China
[2] York Univ, Lab Ind & Appl Math, Toronto, ON M3J 1P3, Canada
基金
中国国家自然科学基金;
关键词
Chinese word representation; Multi-prototype; Synonym knowledge base; Word semantic disambiguation; ONTOLOGY-BASED METHODS; EMBEDDINGS; SENTIMENT;
D O I
10.1007/s40747-021-00482-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The word similarity task is used to calculate the similarity of any pair of words, and is a basic technology of natural language processing (NLP). The existing method is based on word embedding, which fails to capture polysemy and is greatly influenced by the quality of the corpus. In this paper, we propose a multi-prototype Chinese word representation model (MP-CWR) for word similarity based on synonym knowledge base, including knowledge representation module and word similarity module. For the first module, we propose a dual attention to combine semantic information for jointly learning word knowledge representation. The MP-CWR model utilizes the synonyms as prior knowledge to supplement the relationship between words, which is helpful to solve the challenge of semantic expression due to insufficient data. As for the word similarity module, we propose a multi-prototype representation for each word. Then we calculate and fuse the conceptual similarity of two words to obtain the final result. Finally, we verify the effectiveness of our model on three public data sets with other baseline models. In addition, the experiments also prove the stability and scalability of our MP-CWR model under different corpora.
引用
收藏
页码:2977 / 2990
页数:14
相关论文
共 50 条
  • [41] Learning Word Representation Considering Proximity and Ambiguity
    Qiu, Lin
    Cao, Yong
    Nie, Zaiqing
    Yu, Yong
    Rui, Yong
    PROCEEDINGS OF THE TWENTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2014, : 1572 - 1578
  • [42] A prototype of Chinese search engine based on word segmentation techniques
    Liu, XH
    Sun, MS
    2001 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: E-SYSTEMS AND E-MAN FOR CYBERNETICS IN CYBERSPACE, 2002, : 2215 - 2218
  • [43] word2set: WordNet-Based Word Representation Rivaling Neural Word Embedding for Lexical Similarity and Sentiment Analysis
    Jimenez, Sergio
    Gonzalez, Fabio A.
    Gelbukh, Alexander
    Duenas, George
    IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, 2019, 14 (02) : 41 - 53
  • [44] Improving Named Entity Recognition for Chinese Social Media with Word Segmentation Representation Learning
    Peng, Nanyun
    Dredze, Mark
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2016), VOL 2, 2016, : 149 - 155
  • [45] Chinese Word Segmentation for Sub-character Representation
    Zhang, Taozheng
    Shang, Chenyang
    2021 IEEE/ACIS 21ST INTERNATIONAL FALL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS 2021-FALL), 2021, : 177 - 181
  • [46] Few-shot named entity recognition with hybrid multi-prototype learning
    Liao, Zenghua
    Fei, Junbo
    Zeng, Weixin
    Zhao, Xiang
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2023, 26 (05): : 2521 - 2544
  • [47] Word similarity algorithm based on multi-features
    Guo, Chang-Jin
    Pan, Feng
    Zuo, Yi
    JOURNAL OF INTERDISCIPLINARY MATHEMATICS, 2018, 21 (05) : 1067 - 1072
  • [48] A Multi-granulariry Chinese Word Segmentation Method Based on Semantic Similarity for Risk Sources
    Li, Zhi
    Wang, Jiaqiang
    Zhang, Bei
    Zhang, Yanfang
    Ji, Xueling
    Wang, Yu
    2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 5391 - 5393
  • [49] Few-shot named entity recognition with hybrid multi-prototype learning
    Zenghua Liao
    Junbo Fei
    Weixin Zeng
    Xiang Zhao
    World Wide Web, 2023, 26 : 2521 - 2544
  • [50] SENSEMBED: Learning Sense Embeddings for Word and Relational Similarity
    Iacobacci, Ignacio
    Pilehvar, Mohammad Taher
    Navigli, Roberto
    PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, 2015, : 95 - 105