Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features

被引:67
|
作者
Al-Smadi, Mohammad [1 ]
Jaradat, Zain [1 ]
Al-Ayyoub, Mahmoud [1 ]
Jararweh, Yaser [1 ]
机构
[1] Jordan Univ Sci & Technol, Dept Comp Sci, POB 3030, Irbid 22110, Jordan
关键词
Paraphrase identification; Semantic text similarity; Semantic analysis; Arabic language; Natural language processing;
D O I
10.1016/j.ipm.2017.01.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The rapid growth in digital information has raised considerable challenges in particular when it comes to automated content analysis. Social media such as twitter share a lot of its users' information about their events, opinions, personalities, etc. Paraphrase Identification (PI) is concerned with recognizing whether two texts have the same/similar meaning, whereas the Semantic Text Similarity (STS) is concerned with the degree of that similarity. This research proposes a state-of-the-art approach for paraphrase identification and semantic text similarity analysis in Arabic news tweets. The approach adopts several phases of text processing, features extraction and text classification. Lexical, syntactic, and semantic features are extracted to overcome the weakness and limitations of the current technologies in solving these tasks for the Arabic language. Maximum Entropy (MaxEnt) and Support Vector Regression (SVR) classifiers are trained using these features and are evaluated using a dataset prepared for this research. The experimentation results show that the approach achieves good results in comparison to the baseline results. (c) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:640 / 652
页数:13
相关论文
共 50 条
  • [41] A SUPERVISED LEARNING APPROACH USING THE COMBINATION OF SEMANTIC AND LEXICAL FEATURES FOR ARABIC COMMUNITY QUESTION ANSWERING
    Abdel-Latif, Mahmoud
    Samir, Mohamed
    Abdel-Aziz, Shady
    Heeba, Mohamed
    Elmasry, Ahmed
    Torki, Marwan
    2018 IEEE/ACS 15TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2018,
  • [42] Computing Text Semantic Similarity with Syntactic Network of Co-occurrence Distance
    Jiao Y.
    Jing M.
    Kang F.
    Data Analysis and Knowledge Discovery, 2019, 3 (12) : 93 - 100
  • [43] A Semantic Text Expansion for Paraphrasing Identification in Arabic Microblog Posts
    Al-Shboul, Bashar
    Al-Darras, Duha
    Al-Qudah, Dana
    PROCEEDINGS OF 2022 14TH INTERNATIONAL CONFERENCE ON MANAGEMENT OF DIGITAL ECOSYSTEMS, MEDES 2022, 2022, : 129 - 135
  • [44] Personality Recognition from Source Code Based on Lexical, Syntactic and Semantic Features
    Biel, Mikolaj
    Kuta, Marcin
    Kitowski, Jacek
    COMPUTATIONAL SCIENCE - ICCS 2020, PT II, 2020, 12138 : 351 - 363
  • [45] An efficient single document Arabic text summarization using a combination of statistical and semantic features
    Qaroush, Aziz
    Abu Farha, Ibrahim
    Ghanem, Wasel
    Washaha, Mahdi
    Maali, Eman
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2021, 33 (06) : 677 - 692
  • [46] Tense-aspect constructions in Jish Arabic: Morphological, syntactic, and semantic features
    Habib, Sandy
    RUSSIAN JOURNAL OF LINGUISTICS, 2023, 27 (02): : 363 - 391
  • [47] Pronominal Non-Core Datives in Syrian Arabic Pragmatic, Syntactic and Lexical Semantic Properties
    Al-Zahre, Nisrine
    Boneh, Nora
    BRILLS ANNUAL OF AFROASIATIC LANGUAGES AND LINGUISTICS, 2016, 8 (01): : 3 - 36
  • [48] Tweets Clustering using Latent Semantic Analysis
    Rasidi, Norsuhaili Mahamed
    Abu Bakar, Sakhinah
    Razak, Fatimah Abdul
    4TH INTERNATIONAL CONFERENCE ON MATHEMATICAL SCIENCES (ICMS4): MATHEMATICAL SCIENCES: CHAMPIONING THE WAY IN A PROBLEM BASED AND DATA DRIVEN SOCIETY, 2017, 1830
  • [49] Sentiment Analysis Model for Fake News Identification in Arabic Tweets
    Sawan, Aktham
    Thaher, Thaer
    Abu-el-rub, Noor
    2021 IEEE 15TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2021), 2021,
  • [50] Unsupervised measure of Chinese lexical semantic similarity using correlated graph model for news story segmentation
    Feng, Wei
    Nie, Xuecheng
    Zhang, Yujun
    Xie, Lei
    Dang, Jianwu
    NEUROCOMPUTING, 2018, 318 : 236 - 247