A novel sentence similarity model with word embedding based on convolutional neural network

被引:20
|
作者
Yao, Haipeng [1 ]
Liu, Huiwen [1 ]
Zhang, Peiying [1 ,2 ]
机构
[1] Beijing Univ Posts & Telecommun, State Key Lab Networking & Switching Technol, Beijing 100876, Peoples R China
[2] China Univ Petr, Coll Comp & Commun Engn, Qingdao, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
convolutional neural network; sentence similarity; word embedding;
D O I
10.1002/cpe.4415
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this paper, we propose an effective model for the similarity metrics of English sentences. In the model, we first make use of word embedding and convolutional neural network (CNN) to produce a sentence vector and then leverage the information of the sentence vector pair to calculate the score of sentence similarity. Considering the case of long-range semantic dependencies between words, we propose a novel method transforming word embeddings to construct the three-dimensional sentence feature tensor. In addition, we incorporate the k-max pooling into the convolutional neural network to adapt to variable lengths of input sentences. The proposed model requires no external resource such as WordNet and parse tree. Meanwhile, it consumes very little time for training. Finally, we carried out extensive simulations to evaluate the performance of our model compared with other state-of-the-art works. Experimental results on SemEval 2014 task (SICK test corpus) indicated that our model can achieve a good performance in the terms of Pearson correlation coefficient, Spearman correlation coefficient, and mean squared errors. Furthermore, experimental results on Microsoft research paraphrase identification (MSRP) indicated that our model can achieve an excellent performance in the terms of F1 and Accuracy.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] The MeSH-Gram Neural Network Model: Extending Word Embedding Vectors with MeSH Concepts for Semantic Similarity
    Abdeddaim, Said
    Vimard, Sylvestre
    Soualmia, Lina F.
    MEDINFO 2019: HEALTH AND WELLBEING E-NETWORKS FOR ALL, 2019, 264 : 5 - 9
  • [22] Network New Word Discovery Framework Based on Sentence Semantic Vector Similarity
    Yu, GanFeng
    Ma, YueFeng
    Song, Yang
    2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 311 - 316
  • [23] Embedding-Based Deep Neural Network and Convolutional Neural Network Graph Classifiers
    Elnaggar, Sarah G.
    Elsemman, Ibrahim E.
    Soliman, Taysir Hassan A.
    ELECTRONICS, 2023, 12 (12)
  • [25] SpottingNet: Learning the Similarity of Word Images with Convolutional Neural Network for Word Spotting in Handwritten Historical Documents
    Zhong, Zhuoyao
    Pan, Weishen
    Jin, Lianwen
    Mouchere, Harold
    Viard-Gaudin, Christian
    PROCEEDINGS OF 2016 15TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2016, : 295 - 300
  • [26] Gated Convolutional Neural Network for Sentence Matching
    Chen, Peixin
    Guo, Wu
    Chen, Zhi
    Sun, Jian
    You, Lanhua
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2853 - 2857
  • [27] WORD EMBEDDING FOR RECURRENT NEURAL NETWORK BASED TTS SYNTHESIS
    Wang, Peilu
    Qian, Yao
    Soong, Frank K.
    He, Lei
    Zhao, Hai
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4879 - 4883
  • [28] WTL-CNN: a news text classification method of convolutional neural network based on weighted word embedding
    Zhao, Weidong
    Zhu, Lin
    Wang, Ming
    Zhang, Xiliang
    Zhang, Jinming
    CONNECTION SCIENCE, 2022, 34 (01) : 2291 - 2312
  • [29] Text Classification with Topic-based Word Embedding and Convolutional Neural Networks
    Xu, Haotian
    Dong, Ming
    Zhu, Dongxiao
    Kotov, Alexander
    Carcone, April Idalski
    Naar-King, Sylvie
    PROCEEDINGS OF THE 7TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2016, : 88 - 97
  • [30] Closed-Set Chinese Word Segmentation Based on Convolutional Neural Network Model
    Xie, Zhipeng
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2017, 2017, 10565 : 24 - 36