Sentence Embedding and Convolutional Neural Network for Semantic Textual Similarity Detection in Arabic Language

被引:0
|
作者
Adnen Mahmoud
Mounir Zrigui
机构
[1] University of Monastir,Algebra, Numbers Theory and Nonlinear Analyzes Laboratory LATNAL
[2] University of Sousse,Higher Institute of Computer Science and Communication Techniques, Hammam Sousse
关键词
Arabic language; Paraphrase detection; Semantic similarity analysis; Sentence vector representation; Convolutional neural network; Natural language processing;
D O I
暂无
中图分类号
学科分类号
摘要
The continuous increase in extraordinary textual sources on the web has facilitated the act of paraphrase. Its detection has become a challenge in different natural language processing applications (e.g., plagiarism detection, information retrieval and extraction, question answering, etc.). Different from western languages like English, few works have been addressed the problem of extrinsic paraphrase detection in Arabic language. In this context, we proposed a deep learning-based approach to indicate how original and suspect documents expressed the same meaning. Indeed, word2vec algorithm extracted the relevant features by predicting each word to its neighbors. Subsequently, averaging the obtained vectors was efficient for generating sentence vectors representations. Then, convolutional neural network was useful to capture more contextual information and compute the degree of semantic relatedness. Faced to the lack of resources publicly available, paraphrased corpus was developed using skip gram model. It had better performance in replacing an original word by its most similar one that had the same grammatical class from a vocabulary. Finally, the proposed system achieved good results enhancing an efficient contextual relationship detection between Arabic documents in terms of precision (85%) and recall (86.8%) than previous studies.
引用
收藏
页码:9263 / 9274
页数:11
相关论文
共 50 条
  • [1] Sentence Embedding and Convolutional Neural Network for Semantic Textual Similarity Detection in Arabic Language
    Mahmoud, Adnen
    Zrigui, Mounir
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2019, 44 (11) : 9263 - 9274
  • [2] Distributional Semantic Model Based on Convolutional Neural Network for Arabic Textual Similarity
    Mahmoud, Adnen
    Zrigui, Mounir
    [J]. INTERNATIONAL JOURNAL OF COGNITIVE INFORMATICS AND NATURAL INTELLIGENCE, 2020, 14 (01) : 35 - 50
  • [3] A novel sentence similarity model with word embedding based on convolutional neural network
    Yao, Haipeng
    Liu, Huiwen
    Zhang, Peiying
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (23):
  • [4] Sentence Semantic Similarity Model Using Convolutional Neural Networks
    Karthiga, M.
    Sountharrajan, S.
    Suganya, E.
    Sankarananth, S.
    [J]. EAI Endorsed Transactions on Energy Web, 2021, 8 (35) : 1 - 6
  • [5] Detection of medical text semantic similarity based on convolutional neural network
    Zheng, Tao
    Gao, Yimei
    Wang, Fei
    Fan, Chenhao
    Fu, Xingzhi
    Li, Mei
    Zhang, Ya
    Zhang, Shaodian
    Ma, Handong
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2019, 19 (01)
  • [6] Detection of medical text semantic similarity based on convolutional neural network
    Tao Zheng
    Yimei Gao
    Fei Wang
    Chenhao Fan
    Xingzhi Fu
    Mei Li
    Ya Zhang
    Shaodian Zhang
    Handong Ma
    [J]. BMC Medical Informatics and Decision Making, 19
  • [7] Neural sentence embedding models for semantic similarity estimation in the biomedical domain
    Blagec, Kathrin
    Xu, Hong
    Agibetov, Asan
    Samwald, Matthias
    [J]. BMC BIOINFORMATICS, 2019, 20 (1)
  • [8] Neural sentence embedding models for semantic similarity estimation in the biomedical domain
    Kathrin Blagec
    Hong Xu
    Asan Agibetov
    Matthias Samwald
    [J]. BMC Bioinformatics, 20
  • [9] Interpretable Semantic Textual Similarity for Indonesian Sentence
    Rajagukguk, Rio Chandra
    Khodra, Masayu Leylia
    [J]. 2018 5TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATICS: CONCEPTS, THEORY AND APPLICATIONS (ICAICTA 2018), 2018, : 147 - 152
  • [10] Sentence Similarity Measurement with Convolutional Neural Networks Using Semantic and Syntactic Features
    Zhang, Shiru
    Liang, Zhiyao
    Lin, Jian
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2020, 63 (02): : 943 - 957