Word Embedding based Textual Semantic Similarity Measure in Bengali

被引:4
|
作者
Iqbal, Md Asif [1 ]
Sharif, Omar [1 ]
Hoque, Mohammed Moshiul [1 ]
Sarker, Iqbal H. [1 ]
机构
[1] Chittagong Univ Engn & Technol, Dept Comp Sci & Engn, Chattogram 4349, Bangladesh
关键词
Natural language processing; Textual semantic similarity; Word embedding; Cosine similarity; Part-of-speech weighting;
D O I
10.1016/j.procs.2021.10.010
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Textual semantic similarity is a crucial constituent in many NLP tasks such as information retrieval, machine translation, information retrieval and textual forgery detection. It is a complicated task for rule-based techniques to address semantic similarity measures in low-resource languages due to the complex morphological structure and scarcity of linguistic resources. This paper investigates several word embedding techniques (Word2Vec, GloVe, FastText) to estimate the semantic similarity of Bengali sentences. Due to the unavailability of the standard dataset, this work developed a Bengali dataset containing 187031 text documents with 400824 unique words. Moreover, this work considers three semantic distance measures to compute the similarity between the word vectors using Cosine similarity with no weight, term frequency weighting and Part-of-Speech weighting. The performance of the proposed approach is evaluated on the developed dataset containing 50 pairs of Bengali sentences. The evaluation result shows that FastText with continuous bag-of-words with 100 vector size achieved the highest Pearson's correlation (rho) score of 77.28%. (C) 2021 The Authors. Published by Elsevier B.V.
引用
收藏
页码:92 / 101
页数:10
相关论文
共 50 条
  • [21] Measuring Similarity of Academic Articles with Semantic Profile and Joint Word Embedding
    Liu, Ming
    Lang, Bo
    Gu, Zepeng
    Zeeshan, Ahmed
    [J]. TSINGHUA SCIENCE AND TECHNOLOGY, 2017, 22 (06) : 619 - 632
  • [22] Measuring Similarity of Academic Articles with Semantic Profile and Joint Word Embedding
    Ming Liu
    Bo Lang
    Zepeng Gu
    Ahmed Zeeshan
    [J]. Tsinghua Science and Technology, 2017, 22 (06) : 619 - 632
  • [23] Combining and learning word embedding with WordNet for semantic relatedness and similarity measurement
    Lee, Yang-Yin
    Ke, Hao
    Yen, Ting-Yu
    Huang, Hen-Hsen
    Chen, Hsin-Hsi
    [J]. JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2020, 71 (06) : 657 - 670
  • [24] An approach for word categorization based on semantic similarity measure obtained from search engines
    Amasyah, M. Fatih
    [J]. 2006 IEEE 14th Signal Processing and Communications Applications, Vols 1 and 2, 2006, : 53 - 56
  • [25] Word Embedding-Based Approaches for Measuring Semantic Similarity of Arabic-English Sentences
    Nagoudi, El Moatez Billah
    Ferrero, Jeremy
    Schwab, Didier
    Cherroun, Hadda
    [J]. ARABIC LANGUAGE PROCESSING: FROM THEORY TO PRACTICE, 2018, 782 : 19 - 33
  • [26] An Algorithm of Semantic Similarity Between Words Based on Word Single-meaning Embedding Model
    Li, Xiao-Tao
    You, Shu-Juan
    Chen, Wai
    [J]. Zidonghua Xuebao/Acta Automatica Sinica, 2020, 46 (08): : 1654 - 1669
  • [27] Text Semantic Steganalysis Based on Word Embedding
    Zuo, Xin
    Hu, Huanhuan
    Zhang, Weiming
    Yu, Nenghai
    [J]. CLOUD COMPUTING AND SECURITY, PT IV, 2018, 11066 : 485 - 495
  • [28] Analysing the Semantic Change Based on Word Embedding
    Liao, Xuanyi
    Cheng, Guang
    [J]. NATURAL LANGUAGE UNDERSTANDING AND INTELLIGENT APPLICATIONS (NLPCC 2016), 2016, 10102 : 213 - 223
  • [29] Textual Similarity for Word Sequences
    Konaka, Fumito
    Miura, Takao
    [J]. SIMILARITY SEARCH AND APPLICATIONS, SISAP 2015, 2015, 9371 : 244 - 249
  • [30] Sentence Embedding and Convolutional Neural Network for Semantic Textual Similarity Detection in Arabic Language
    Adnen Mahmoud
    Mounir Zrigui
    [J]. Arabian Journal for Science and Engineering, 2019, 44 : 9263 - 9274