Word Embedding based Textual Semantic Similarity Measure in Bengali

被引:4
|
作者
Iqbal, Md Asif [1 ]
Sharif, Omar [1 ]
Hoque, Mohammed Moshiul [1 ]
Sarker, Iqbal H. [1 ]
机构
[1] Chittagong Univ Engn & Technol, Dept Comp Sci & Engn, Chattogram 4349, Bangladesh
关键词
Natural language processing; Textual semantic similarity; Word embedding; Cosine similarity; Part-of-speech weighting;
D O I
10.1016/j.procs.2021.10.010
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Textual semantic similarity is a crucial constituent in many NLP tasks such as information retrieval, machine translation, information retrieval and textual forgery detection. It is a complicated task for rule-based techniques to address semantic similarity measures in low-resource languages due to the complex morphological structure and scarcity of linguistic resources. This paper investigates several word embedding techniques (Word2Vec, GloVe, FastText) to estimate the semantic similarity of Bengali sentences. Due to the unavailability of the standard dataset, this work developed a Bengali dataset containing 187031 text documents with 400824 unique words. Moreover, this work considers three semantic distance measures to compute the similarity between the word vectors using Cosine similarity with no weight, term frequency weighting and Part-of-Speech weighting. The performance of the proposed approach is evaluated on the developed dataset containing 50 pairs of Bengali sentences. The evaluation result shows that FastText with continuous bag-of-words with 100 vector size achieved the highest Pearson's correlation (rho) score of 77.28%. (C) 2021 The Authors. Published by Elsevier B.V.
引用
收藏
页码:92 / 101
页数:10
相关论文
共 50 条
  • [11] Multilingual Semantic Textual Similarity using Multilingual Word Representations
    Ahmed, Mahtab
    Dixit, Chahna
    Mercer, Robert E.
    Khan, Atif
    Samee, Muhammad Rifayat
    Urra, Felipe
    2020 IEEE 14TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2020), 2020, : 194 - 198
  • [12] Attentive Siamese LSTM Network for Semantic Textual Similarity Measure
    Bao, Wei
    Bao, Wugedele
    Du, Jinhua
    Yang, Yuanyuan
    Zhao, Xiaobing
    2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 312 - 317
  • [13] An Arabic Word Similarity Measure for Semantic Conversational Agents
    Noori, Zaid
    Crockett, Keeley
    Bandar, Zuhair
    Al-Mousa, Mohammed
    2018 IEEE 2ND INTERNATIONAL WORKSHOP ON ARABIC AND DERIVED SCRIPT ANALYSIS AND RECOGNITION (ASAR), 2018, : 119 - 123
  • [14] Attention-Based Overall Enhance Network for Chinese Semantic Textual Similarity Measure
    Zhang, Hao
    Zhang, HuaXiong
    Lu, XingYu
    Gao, Qiang
    JOURNAL OF APPLIED SCIENCE AND ENGINEERING, 2022, 25 (02): : 287 - +
  • [15] The Semantic Similarity Relation of Entities Discovery: Using Word Embedding
    Ruan, Dong-ru
    Mao, Yu-xin
    Pan, Hong-yan
    Gao, Kai
    2017 9TH INTERNATIONAL CONFERENCE ON MODELLING, IDENTIFICATION AND CONTROL (ICMIC 2017), 2017, : 845 - 850
  • [16] A survey on word embedding techniques and semantic similarity for paraphrase identification
    Kubal, Divesh R.
    Nimkar, Anant V.
    International Journal of Computational Systems Engineering, 2019, 5 (01) : 36 - 52
  • [17] A New Measure of Word Semantic Similarity based on WordNet Hierarchy and DAG Theory
    Qin, Peng
    Lu, Zhao
    Yan, Yu
    Wu, Fang
    WISM: 2009 INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND MINING, PROCEEDINGS, 2009, : 181 - 185
  • [18] Automated Short-Answer Grading using Semantic Similarity based on Word Embedding
    Lubis, Fetty Fitriyanti
    Mutaqin
    Putri, Atina
    Waskita, Dana
    Sulistyaningtyas, Tri
    Arman, Arry Akhmad
    Rosmansyah, Yusep
    INTERNATIONAL JOURNAL OF TECHNOLOGY, 2021, 12 (03) : 571 - 581
  • [19] Semantic textual similarity between sentences using bilingual word semantics
    Shajalal, Md
    Aono, Masaki
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2019, 8 (02) : 263 - 272
  • [20] Semantic textual similarity between sentences using bilingual word semantics
    Md. Shajalal
    Masaki Aono
    Progress in Artificial Intelligence, 2019, 8 : 263 - 272