Modeling multi-prototype Chinese word representation learning for word similarity

被引:2
|
作者
Yin, Fulian [1 ]
Wang, Yanyan [1 ]
Liu, Jianbo [1 ]
Tosato, Marco [2 ]
机构
[1] Commun Univ China, Inst Informat & Commun, Beijing 100024, Peoples R China
[2] York Univ, Lab Ind & Appl Math, Toronto, ON M3J 1P3, Canada
基金
中国国家自然科学基金;
关键词
Chinese word representation; Multi-prototype; Synonym knowledge base; Word semantic disambiguation; ONTOLOGY-BASED METHODS; EMBEDDINGS; SENTIMENT;
D O I
10.1007/s40747-021-00482-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The word similarity task is used to calculate the similarity of any pair of words, and is a basic technology of natural language processing (NLP). The existing method is based on word embedding, which fails to capture polysemy and is greatly influenced by the quality of the corpus. In this paper, we propose a multi-prototype Chinese word representation model (MP-CWR) for word similarity based on synonym knowledge base, including knowledge representation module and word similarity module. For the first module, we propose a dual attention to combine semantic information for jointly learning word knowledge representation. The MP-CWR model utilizes the synonyms as prior knowledge to supplement the relationship between words, which is helpful to solve the challenge of semantic expression due to insufficient data. As for the word similarity module, we propose a multi-prototype representation for each word. Then we calculate and fuse the conceptual similarity of two words to obtain the final result. Finally, we verify the effectiveness of our model on three public data sets with other baseline models. In addition, the experiments also prove the stability and scalability of our MP-CWR model under different corpora.
引用
收藏
页码:2977 / 2990
页数:14
相关论文
共 50 条
  • [1] Modeling multi-prototype Chinese word representation learning for word similarity
    Fulian Yin
    Yanyan Wang
    Jianbo Liu
    Marco Tosato
    Complex & Intelligent Systems, 2021, 7 : 2977 - 2990
  • [2] Learning multi-prototype word embedding from single-prototype word embedding with integrated knowledge
    Yang, Xuefeng
    Mao, Kezhi
    EXPERT SYSTEMS WITH APPLICATIONS, 2016, 56 : 291 - 299
  • [3] Sense-Aware Semantic Analysis: A Multi-Prototype Word Representation Model Using Wikipedia
    Wu, Zhaohui
    Giles, C. Lee
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 2188 - 2194
  • [4] Multi-prototype Chinese Character Embedding
    Lu, Yanan
    Zhang, Yue
    Ji, Donghong
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 855 - 859
  • [5] Improving Twitter Sentiment Classification Using Topic-Enriched Multi-Prototype Word Embeddings
    Ren, Yafeng
    Zhang, Yue
    Zhang, Meishan
    Ji, Donghong
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 3038 - 3044
  • [6] Database clustering based on multi-prototype representation of cluster
    Liu, Manhua
    Jiang, Xudong
    Kot, Alex C.
    2007 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-5, 2007, : 2198 - 2201
  • [7] CHINESE WORD SIMILARITY COMPUTING
    Li, Lei
    Wang, Zhiqing
    PROCEEDINGS OF THE 3RD IEEE INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC 2012), 2012, : 619 - 623
  • [8] Improving Multi-Criteria Chinese Word Segmentation through Learning Sentence Representation
    Lin, Chun-Yi
    Lin, Ying-Jia
    Li, Yi-Ting
    Yeh, Chia-Jen
    Yang, Ching-Wen
    Kao, Hung-Yu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 12756 - 12763
  • [9] Multi-Context Information for Word Representation Learning
    Dewalkar, Swapnil
    Desarkar, Maunendra Sankar
    DOCENG'19: PROCEEDINGS OF THE ACM SYMPOSIUM ON DOCUMENT ENGINEERING 2019, 2019,
  • [10] Evolutionary Clustering Using Multi-prototype Representation and Connectivity Criterion
    Jose-Garcia, Adan
    Gomez-Flores, Wilfrido
    PATTERN RECOGNITION (MCPR 2017), 2017, 10267 : 63 - 73