Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features

被引:67
|
作者
Al-Smadi, Mohammad [1 ]
Jaradat, Zain [1 ]
Al-Ayyoub, Mahmoud [1 ]
Jararweh, Yaser [1 ]
机构
[1] Jordan Univ Sci & Technol, Dept Comp Sci, POB 3030, Irbid 22110, Jordan
关键词
Paraphrase identification; Semantic text similarity; Semantic analysis; Arabic language; Natural language processing;
D O I
10.1016/j.ipm.2017.01.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The rapid growth in digital information has raised considerable challenges in particular when it comes to automated content analysis. Social media such as twitter share a lot of its users' information about their events, opinions, personalities, etc. Paraphrase Identification (PI) is concerned with recognizing whether two texts have the same/similar meaning, whereas the Semantic Text Similarity (STS) is concerned with the degree of that similarity. This research proposes a state-of-the-art approach for paraphrase identification and semantic text similarity analysis in Arabic news tweets. The approach adopts several phases of text processing, features extraction and text classification. Lexical, syntactic, and semantic features are extracted to overcome the weakness and limitations of the current technologies in solving these tasks for the Arabic language. Maximum Entropy (MaxEnt) and Support Vector Regression (SVR) classifiers are trained using these features and are evaluated using a dataset prepared for this research. The experimentation results show that the approach achieves good results in comparison to the baseline results. (c) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:640 / 652
页数:13
相关论文
共 50 条
  • [1] A Text Semantic Similarity Approach for Arabic Paraphrase Detection
    Mahmoud, Adnen
    Zrigui, Ahmed
    Zrigui, Mounir
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2017, PT II, 2018, 10762 : 338 - 349
  • [2] Question to Question Similarity Analysis Using Morphological, Syntactic, Semantic, and Lexical Features
    Hammad, Mahmoud M.
    Al-Smadi, Mohammad
    Baker, Qanita Bani
    Al-asa'd, Muntaha
    Al-khdour, Nour
    Younes, Mutaz Bni
    Khwaileh, Enas
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2020, 26 (06) : 671 - 697
  • [3] Question to Question Similarity Analysis Using Morphological, Syntactic, Semantic, and Lexical Features
    Al-asa'd, Muntaha
    Al-khdour, Nour
    Younes, Mutaz Bni
    Khwaileh, Enas
    Hammad, Mahmoud
    AL-Smadi, Mohammad
    2019 IEEE/ACS 16TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA 2019), 2019,
  • [4] Semantic Similarity Analysis for Corpus Development and Paraphrase Detection in Arabic
    Mahmoud, Adnen
    Zrigui, Mounir
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2021, 18 (01) : 1 - 7
  • [5] Text classification for cognitive domains: A case using lexical, syntactic and semantic features
    Qiao, Chen
    Hu, Xiao
    JOURNAL OF INFORMATION SCIENCE, 2019, 45 (04) : 516 - 528
  • [6] A Semantic and Syntactic Similarity Measure for Political Tweets
    Little, Claire
    Mclean, David
    Crockett, Keeley
    Edmonds, Bruce
    IEEE ACCESS, 2020, 8 : 154095 - 154113
  • [7] Paraphrase identification using semantic heuristic features
    Ul-Qayyum, Zia
    Altaf, Wasif
    Research Journal of Applied Sciences, Engineering and Technology, 2012, 4 (22) : 4894 - 4904
  • [8] Contribution of Syntactic and Semantic Attributes in Paraphrase Identification
    Karaoglan, Bahar
    Kisla, Tarik
    Kumova Metin, Senem
    2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [9] Semantic Analysis for Paraphrase Identification using Semantic Role Labeling
    Lee, Eunji
    Lynn, Htet Myet
    Kim, Hyoungju
    Yeom, Soonja
    Kim, Pankoo
    SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, : 2135 - 2138
  • [10] Assessing sentence similarity through lexical, syntactic and semantic analysis
    Ferreira, Rafael
    Lins, Rafael Dueire
    Simske, Steven J.
    Freitas, Fred
    Riss, Marcelo
    COMPUTER SPEECH AND LANGUAGE, 2016, 39 : 1 - 28