Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features

被引:67
|
作者
Al-Smadi, Mohammad [1 ]
Jaradat, Zain [1 ]
Al-Ayyoub, Mahmoud [1 ]
Jararweh, Yaser [1 ]
机构
[1] Jordan Univ Sci & Technol, Dept Comp Sci, POB 3030, Irbid 22110, Jordan
关键词
Paraphrase identification; Semantic text similarity; Semantic analysis; Arabic language; Natural language processing;
D O I
10.1016/j.ipm.2017.01.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The rapid growth in digital information has raised considerable challenges in particular when it comes to automated content analysis. Social media such as twitter share a lot of its users' information about their events, opinions, personalities, etc. Paraphrase Identification (PI) is concerned with recognizing whether two texts have the same/similar meaning, whereas the Semantic Text Similarity (STS) is concerned with the degree of that similarity. This research proposes a state-of-the-art approach for paraphrase identification and semantic text similarity analysis in Arabic news tweets. The approach adopts several phases of text processing, features extraction and text classification. Lexical, syntactic, and semantic features are extracted to overcome the weakness and limitations of the current technologies in solving these tasks for the Arabic language. Maximum Entropy (MaxEnt) and Support Vector Regression (SVR) classifiers are trained using these features and are evaluated using a dataset prepared for this research. The experimentation results show that the approach achieves good results in comparison to the baseline results. (c) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:640 / 652
页数:13
相关论文
共 50 条
  • [21] Measuring the short text similarity based on semantic and syntactic information
    Yang, Jiaqi
    Li, Yongjun
    Gao, Congjie
    Zhang, Yinyin
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2021, 114 : 169 - 180
  • [22] TEXT CONTENT ANALYSIS USING ONTOLOGY AND SEMANTIC SIMILARITY
    Prodanovic, Dejan
    Furlan, Bojan
    Nikolic, Bosko
    2014 22ND TELECOMMUNICATIONS FORUM TELFOR (TELFOR), 2014, : 1126 - 1129
  • [23] Combining Lexical and Semantic Features for Short Text Classification
    Yang, Lili
    Li, Chunping
    Ding, Qiang
    Li, Li
    17TH INTERNATIONAL CONFERENCE IN KNOWLEDGE BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS - KES2013, 2013, 22 : 78 - 86
  • [24] Sentence Similarity Using Syntactic and Semantic Features for Multi-document Summarization
    Anjaneyulu, M.
    Sarma, S. S. V. N.
    Reddy, P. Vijaya Pal
    Chander, K. Prem
    Nagaprasad, S.
    INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, VOL 2, 2019, 56 : 471 - 485
  • [25] Sentence Similarity Measurement with Convolutional Neural Networks Using Semantic and Syntactic Features
    Zhang, Shiru
    Liang, Zhiyao
    Lin, Jian
    CMC-COMPUTERS MATERIALS & CONTINUA, 2020, 63 (02): : 943 - 957
  • [26] Text Similarity Based on Semantic Analysis
    Wang, Junli
    Zhou, Qing
    Sun, Guobao
    PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INDUSTRIAL ENGINEERING (AIIE 2016), 2016, 133 : 303 - 307
  • [27] A Short-Text Similarity Model Combining Semantic and Syntactic Information
    Zhou, Ya
    Li, Cheng
    Huang, Guimin
    Guo, Qingkai
    Li, Hui
    Wei, Xiong
    ELECTRONICS, 2023, 12 (14)
  • [28] Enhancement of Chemical Entity Identification in Text Using Semantic Similarity Validation
    Grego, Tiago
    Couto, Francisco M.
    PLOS ONE, 2013, 8 (05):
  • [29] Enhancing Aspect-Based Sentiment Analysis of Arabic Hotels' reviews using morphological, syntactic and semantic features
    Al-Smadi, Mohammad
    Al-Ayyoub, Mahmoud
    Jararweh, Yaser
    Qawasmeh, Omar
    INFORMATION PROCESSING & MANAGEMENT, 2019, 56 (02) : 308 - 319
  • [30] Using Standardized Lexical Semantic Knowledge to Measure Similarity
    Wali, Wafa
    Gargouri, Bilel
    Ben Hamadou, Abdelmajid
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2014, 2014, 8793 : 93 - 104